Magnetohydrodynamic Simulation Code CANS+: Assessments and Applications

We present a new magnetohydrodynamic (MHD) simulation code with the aim of providing accurate numerical solutions to astrophysical phenomena where discontinuities, shock waves, and turbulence are inherently important. The code implements the HLLD approximate Riemann solver, the fifth-order-monotonicity-preserving interpolation (MP5) scheme, and the hyperbolic divergence cleaning method for a magnetic field. This choice of schemes significantly improved numerical accuracy and stability, and saved computational costs in multidimensional problems. Numerical tests of one- and two-dimensional problems showed the advantages of using the high-order scheme by comparing with results from a standard second-order TVD MUSCL scheme. The present code enabled us to explore long-term evolution of a three-dimensional accretion disk around a black hole, in which compressible MHD turbulence caused continuous mass accretion via nonlinear growth of the magneto-rotational instability (MRI). Numerical tests with various computational cell sizes exhibited a convergent picture of the early nonlinear growth of the MRI in a global model, and indicated that the MP5 scheme has more than twice the resolution of the MUSCL scheme in practical applications.


Introduction
In the last decades, computational astrophysics has emerged together with the rapid growth of computational capabilities, enabling us to reveal various aspects of astrophysical phenomena that cannot be provided by observations. Among them, magnetohydrodynamic (MHD) simulations are powerful tools for understanding space and astrophysical phenomena such as solar flares and coronal mass ejections, auroral substorms in the terrestrial magnetosphere, magnetic dynamos in accretion disks and associated jet accelerations. In most cases, the systems are dominated by various MHD discontinuities, shock waves, and turbulence in which the fluid dynamics are strongly coupled with the magnetic field. Such highly nonlinear systems have stimulated the search for numerical algorithms that can solve the MHD equations with both high accuracy and stability.
2 Early development of MHD simulation codes was based on finite difference schemes with artificial and/or numerical dissipation, such as the modified Lax-Wendroff scheme (Rubin & Burstein 1967;Shibata 1983). The high-order upwind scheme was implemented in the ZEUS code (Stone et al. 1992;Hawley & Stone 1995) and has been used for many applications in astrophysics owing to its simplicity and flexibility. However, besides these early successes in computational astrophysics, overcoming spurious (grid) oscillations in highly stratified, compressible MHD applications has remained a task for further improvement in the numerical schemes (e.g., Kudoh et al. 1999).
Modern MHD simulation codes have a strategy of accurately capturing shock waves as situations often accompany with supersonic flows in space and astrophysical phenomena. Such shockcapturing schemes are based on the finite volume method in which the time evolution of cell-averaged conservative variables of the MHD equations is calculated from numerical fluxes at the cell surfaces.
Thus, the accuracy and the robustness rely on the method of obtaining the numerical flux from the cell-averaged conservative variables.
In the numerical flux calculation, upwind natures are incorporated by solving a Riemann problem at the cell interface initiated with the two neighboring cell states. The so-called Godunov schemes are based on the idea that the numerical flux can be obtained by integrating the conservative variables in the Riemann fan following the conservation laws in space and time. For numerical solutions to the Riemann problem, various approximate Riemann solvers have been developed including the Roe's scheme (Roe 1981) and a family of Harten-Lax-van Leer (HLL) solvers (Harten et al. 1983;Li 2005;Miyoshi & Kusano 2005). Among them, the HLLD approximate Riemann solver of Miyoshi & Kusano (2005) is now the standard method in modern MHD codes (see e.g., Kritsuk et al. 2011) as it provides tractable and stable numerical solutions.
Special care must be taken in multidimensional MHD simulations as the solenoidal property of the magnetic field cannot be straightforwardly satisfied, and numerical errors from the divergence-free condition severely affect results, in particular, when solving the conservative forms of the MHD equations numerically. While divergence cleaning methods aim at maintaining the numerical errors within minimal levels by making modifications to the base equations (Brackbill & Barnes 1980;Powell et al. 1999;Dedner et al. 2002), the constrained-transport (CT) algorithm (Evans & Hawley 1988) accomplishes the divergence-free condition within a level of machine round-off errors by adopting a special discritization for the magnetic field. Nowadays, most MHD codes employ variants of the CT algorithm while preserving the upwind nature for the induction equation of the magnetic field (Londrillo & Del Zanna 2004;Gardiner & Stone 2005;Lee & Deane 2009;Miyoshi & Kusano 2011;Minoshima et al. 2015;Lee et al. 2017;Minoshima et al. 2019). For further information, readers may consult comprehensive comparisons of the schemes by Tóth (2000), and also by Miyoshi & Kusano (2011) with more recent schemes.
Extensions of the scheme to a higher-order accuracy have been accomplished by adopting piecewise high-order polynomials to reconstruct variables' profiles in each cell used for the local Riemann problem at the cell interface, with which the total variation diminishing (TVD) property has been necessarily incorporated by adopting various slope limiters, so that the profiles degrade to the first order around discontinuities. The so-called MUSCL scheme (van Leer 1979) and its extension of the third-order PPM scheme (Colella & Woodward 1984) have been widely implemented for practical uses in public MHD codes as FLASH (Lee & Deane 2009;Lee 2013), PLUTO (Mignone et al. 2007), and Athena (Stone et al. 2008). See also Kritsuk et al. (2011) for other MHD codes. Note, however, that the overall spacial accuracy of these CT-based MHD codes has been achieved only up to the second order because the original CT algorithm is based on the second-order discritization. Thirdorder scheme has been proposed in the framework of the upwind CT (UCT) with additional large computational costs arising from high-order reconstructions and solutions of the Riemann problem at the cell edge (Londrillo & Del Zanna 2004). A simple and cost-effective, fifth-order scheme was proposed by adopting the hyperbolic divergence cleaning method (Dedner et al. 2002) by Mignone et al. (2010). We have adopted a similar strategy as described in the present paper.
The Coordinated Astronomical Numerical Software, CANS, was developed for the Japanese astrophysical community by T. Yokoyama at the University of Tokyo and collaborators, and is publicly available with documents at the website http://www-space.eps.s.u-tokyo.ac.jp/ yokoyama/etc/cans/. The base schemes are the modified Lax-Wendroff scheme and the CIP-MOCCT scheme (Yabe et al. 2001;Kudoh et al. 1999), but the Roe-MUSCL scheme is also available.
Additional physics modules of heat conduction and radiative cooling, and many application modules are also included. A physical problem can be easily solved by choosing the numerical schemes and visualizing by prebuilt Interactive Data Language (IDL) procedures, from which researchers and graduate students benefit when starting their simulation studies. CANS has been used in space and astrophysical applications since its start in 2002 (e.g., Asai et al. 2004;Isobe et al. 2005;Hanayama et al. 2005;Tao et al. 2005;Tsubouchi 2009;Toriumi & Yokoyama 2011). CANS+ (CANS-plus) was designed based on the CANS philosophy, but has implemented the state-of-the-art numerical algorithms and parallelization. The codes are written in Fortran90/95 and are organized in a modular way as in CANS. Hybrid parallelization is implemented by using the MPI library and OpenMP for effective parallel scaling on modern massively parallel supercomputer systems. Scripts for reading output data are prepared for both Python and IDL languages so users can choose either of these environments for analysis. Sample codes for visualization in Python using the Matplotlib modules and IDL are also available for quick analyses. The code can be downloaded from the CANS+ documentation website http://www.astro.phys.s.chiba-u.ac.jp/cans/doc/. In this paper, we detail the numerical algorithms adopted in CANS+ in Section 2 and show assessments of the code's capability with various physical problems in Section 3. As an application of CANS+, we present three-dimensional (3D) global simulations of a black hole accretion disk and discuss how the selection of the numerical schemes affects the angular momentum transport and the resulting mass accretion rate during its long-term evolution in Section 4. A summary of the paper and future perspectives are given in Section 5.

CANS+
2.1 Basic equations CANS+ solves the MHD equations in a normalized, semiconservative form as where ρ, v, g, B, and e are the mass density, the velocity, the gravitational acceleration, the magnetic field, and the total energy density, respectively, I is a unit tensor η is the resistivity, j = ∇ × B is the current density, and p t represents the total (thermal and magnetic) pressure defined as The thermal pressure p is given by where γ is the specific heat ratio.
The equation (5) for ψ with equation (3) is introduced so that the solenoidal property of the 5 magnetic field is maintained within minimal errors during time integration (see subsection 2.4). The equations (1) -(5) complete a set of the MHD equations, which is conventionally referred to as GLM (generalized Lagrange multiplier)-MHD equations (Dedner et al. 2002).
The GLM-MHD equations then read ∂U ∂t + s=x,y,z where is a state vector of the conservative variables, are a flux vector and the Kronecker delta, respectively, and is a source vector. We also define a vector V for the primitive variables Let us consider equation (10) with S = 0 in one dimension (1D) (s = x). It can be written in the form ∂U ∂t and are the Jacobian matrix for the conservative and the primitive variables, respectively, and is the quasilinear conversion matrix, which relates δV to δU . Multiplying equation (16) by the left eigenvector (L p with L p R p = I) of the Jacobian matrix gives the equation for the characteristic variables W as where δW = L p δV , and L p A p R p = Λ = diag(λ 1 , λ 2 , ..., λ 9 ) is a diagonal matrix containing the eigenvalues. For the GLM-MHD equations, λ 1 ∼ λ 9 are where refer to the fast (positive) and slow (negative) magnetosonic speeds, respectively, and is the Alfvén speed. In addition to the fast (λ 2,8 ), the Alfvén (λ 3,7 ), the slow (λ 4,6 ), and the entropy (λ 5 ) waves of the ideal MHD system, the GLM-MHD system introduces two waves with the eigenvalues (phase speeds) of λ 1,9 = ∓c h , which represent omnidirectional propagation of numerical errors of the divergence-free condition (equation (9)).

Numerical algorithms
Time evolution of the equation (10) for the state vector at a cell U i,j,k is obtained by the finite volume method whereŪ is the cell-averaged value of U , ∆ s with s = i, j, k represents cell width in each dimension, F * (s±1/2) denotes a numerical flux at cell surfaces, and L(Ū ) defines a numerical operator of the finite volume method. Equation (24) is integrated with time by the third-order strong-stability-preserving (SSP) Runge-Kutta (RK) scheme (Suresh & Huynh 1997;Gottlieb & Shu 1998): where the superscript n stands for a number of time steps and ∆t n is a corresponding step size. ∆t n is determined from the Courant-Friedrichs-Lewy (CFL) condition where λ s2,8 (s = i,j,k) is the eigenvalue for the fast magnetosonic wave calculated for each dimension, and σ c is the CFL number. For the resistive MHD case, i.e., η = 0, ∆t from equation (30) is again compared with the one determined from the diffusion number σ d ∆t = min ∆t, σ d min In the following numerical experiments, σ c = σ d = 0.3 were adopted unless otherwise stated.
The numerical operator L(Ū ) is evaluated by following procedures.
1. Conversion from the cell-averaged primitive variables to the characteristic variables locally (Harten et al. 1987). This operation is done by multiplying the primitive variables at cells within a stencil centered at a cell (i, j, k) by the left eigenvector calculated at the cell (i, j, k). For example, in the x-direction, with q varying from −r to +r for the (2r + 1)th-order scheme. Among various eigenvectors, CANS+ adopts the same eigenvectors as those used in the Athena code (Stone et al. 2008).
Although this conversion is computationally expensive, separating waves according to the characteristics is necessary for obtaining nonoscillatory profiles by higher-order reconstructions as in the following.
8 2. Reconstruction of the characteristic variables W within a cell. To obtain high-resolution, nonoscillatory profiles, the fifth-order-monotonicity-preserving (MP5) interpolation scheme (Suresh & Huynh 1997) is used (see subsection 2.3). In the MP5 scheme, a five-point stencil (r = 2) is required for the conversion and the reconstruction steps. The interpolated values at both cell surfaces are obtained at once in each dimension. For example, for a cell at (i,j,k) in the x-direction, the left-hand, right-state vector R W (i−1/2,j,k) and the right-hand, left-state vector L W (i+1/2,j,k) can be obtained for each interpolation procedure.
3. The interpolated values at cell surfaces are then converted to the primitive variables using the local right eigenvector as where R ps with s = x, y, z represents the right eigenvector in each direction.
4. Evaluation of the numerical flux F * in each dimension by the Godunov-type scheme. That is, the numerical flux at a cell surface (i + 1/2, j, k), for example, can be obtained from where U R x−x i+1/2 ∆t ; LV i+1/2 , R V i+1/2 is a solution inside the Riemann fan during a time interval of ∆t with the initial conditions of the left ( L V i+1/2 ) and right ( R V i+1/2 ) states. (Here we have omitted (j, k) in the notations for simplicity.) CANS+ employs the HLLD approximate Riemann solver of Miyoshi & Kusano (2005), which gives accurate and robust solutions to Riemann problems. Note, however, that equations for the magnetic field component normal to the cell surface and for ψ are decoupled from the seven remaining equations. The numerical flux values for these variables are easily obtained separately (see subsection 2.4).
6. Conversion from the updated conservative variables to the primitive variables.
7. Return to 1. The MP5 scheme in the reconstruction step is based on a fourth-degree polynomial for each dimension. The left state of a quantity f at a cell i + 1/2 is first evaluated by wheref is a cell-averaged value of f . The right state at i − 1/2 is also obtained at the same time from its symmetry in each dimension: The original value results in a fifth-order spatially accurate solution in smooth regions, but an oscillatory profile around discontinuities. The MP5 scheme seeks a profile that degrades to the first order only around discontinuities. For this purpose, the original value is then brought to an interval using the median function Here the median function returns an intermediate value among three variables in the arguments, so that the cell surface value lies in an interval between f min and f max . The basic concept is therefore the same as in standard TVD schemes. The TVD schemes with a three-point stencil, however, cannot distinguish between a discontinuity and a smooth profile around the extremum (figure 1). The  & Huynh (1997) for detailed derivations of f min and f max , and the CFL restriction. Implementation of the MP5 scheme to CANS+ for practical uses is given in Appendix 1.  Suresh & Huynh (1997) .

Hyperbolic divergence cleaning method
Upwind schemes including the finite volume method always suffer from numerical errors of the solenoidal property of the magnetic field. This occurs because the Lorentz force term in the momentum equation (equation (2)) involves an acceleration term along the magnetic field arising from numerical errors of ∇ · B; that is, which finally leads to undesired results and thus inhibits following the long-term evolution of multidimensional problems. To overcome this inherent problem in multidimensional MHD simulations, several approaches have been developed (Tóth 2000;Miyoshi & Kusano 2011). Among them, modern MHD simulation codes have adopted strategies to satisfy discretized forms of the divergence-free condition within machine round-off errors based on a family of upwind constrained-transport (CT) algorithms (Londrillo & Del Zanna 2004;Gardiner & Stone 2005;Mignone et al. 2007;Lee & Deane 2009;Miyoshi & Kusano 2011;Minoshima et al. 2015;Lee et al. 2017;Minoshima et al. 2019). In contrast, CANS+ adopts the hyperbolic divergence cleaning method (Dedner et al. 2002;Mignone et al. 2010), which transports and damps numerical errors of the divergence-free condition, so that the growth of errors is managed to be within minimal levels. More specifically, multiplying the equation (5) by ∂/∂t and equation (3) by ∇·, one finds that the scalar potential ψ and ∇ · B obey the telegraph equations where c h and c p are constants, and characterize the propagation speed and the damping rate, respec-tively.
Equation (5) has a semiconservative form and thus its numerical solution can benefit from the same high-order scheme applied to the original MHD equations. Let us consider the corresponding equations: The system equations without the source term on the right-hand side in each dimension are decoupled from the MHD equations, and have eigenvalues of λ 1,9 = ∓c h and the right eigenvectors of ( 1 ±c h ) T . As c h is a constant, they are linear equations. Thus, the numerical flux at a cell surface is given as where L ψ, L B s , R ψ, and R B s denote the left (L) and the right (R) states at the cell surface for each quantity, respectively. These can be obtained in the same manner as in the reconstruction step. The obtained numerical flux is added to the numerical flux from the HLLD Riemann solver.
The contribution from the source term ( 0 −ψc 2 h /c 2 p ) T is then added in an operator-split fashion as where C 1 and C 2 are the weighting coefficients, L F V represents the finite volume differentiation in equation (46), and the superscripts on ψ, C 1 , and C 2 denote the Runge-Kutta substage (m) in equations (25)-(28).
The constants of c h and c p can be arbitrary defined. As c h represents the propagation speed of numerical errors, it is typically determined from the CFL condition as Then, c p is determined from the relation c 2 p /c h = const. so as to equal the transport and decay time scales. Numerical experiments have shown that c p = √ 0.18c h results in the best performance regardless of grid resolution (Dedner et al. 2002) and was adopted in CANS+.

Source terms
The source terms on the right-hand side of equations (2) -(4) are evaluated at the cell center at each Runge-Kutta substage. (ηj) and (ηj × B) in the resistivity terms are first evaluated at the cell center by the second-order finite difference for the current density j = ∇ × B, then they are added as a numerical flux at the cell surface by the arithmetic mean of the two neighboring cell-center values. Therefore, the resistivity terms degrade the overall spatial accuracy because of its secondorder representation (more details are shown in Appendix 3).

Some prescriptions for numerical stability
Higher-order, multidimensional codes usually suffer from vanishing and negative values of the scalar variables (ρ and p). CANS+ ensures numerical stability by examining the following prescriptions.
• In the reconstruction step, the first-order interpolation is applied for the scalar variables and the normal component of the vector fields: if one of the scalar variables recovered from the interpolated characteristic variables at a cell surface resulted in a negative value, or if one of the scalar variables at a cell centered in the five-point stencil was two orders of magnitude smaller than the other cell values.
• In the conversion step, the thermal pressure given by equation (8) is evaluated every time in the Runge-Kutta substage. If the pressure becomes negative, it is overwritten by the value in the previous substage. The pressure is also evaluated in terms of the plasma beta (β = 2p/B 2 ) so as to bound the minimum β value. CANS+ allows β ≥ β min = 0.001. The total energy density is then updated by the new pressure value. Thus, it is not strictly conserved in CANS+, while the mass conservation is satisfied.
These operations can provide stable solutions in rarefied and low-β plasma regions as demonstrated in the two-dimensional (2D) simulations of the Parker instability as will be shown in 3.2.4.

CANS+ in cylindrical coordinates
CANS+ in cylindrical coordinates (R, φ, z) bases on the equations in a semi-conservative form as F U and S U are the flux and the source term (equation (14)) corresponding to each conservative variable (U = ρ, ρv, B, e, ψ).
The conservation equations are discretized in a form, for example, The source term inherent in a curvilinear coordinate system (the first term on the right-hand side of equation (51)) is evaluated by taking the arithmetic mean of cell-surface values as (F ρv φφ(i,j−1/2) + F ρv φφ(i,j+1/2) )/2. This source term also degrades the overall spatial accuracy because of its secondorder representation.
Special care is taken in the MP5 reconstruction step for the code in cylindrical coordinates. Mignone (2014) showed that incorporating the curvature of the cell into the piecewise polynomial reconstruction, namely, the volume-weighted reconstruction, greatly improved the solutions near the origin of the coordinate axis (along the z-axis in cylindrical coordinates). This technique has been implemented in the cylindrical version of the code (more details are shown in Appendix 2).
The cylindrical version of the code has been used in global simulations of accretion disks, as shown in Section 4.

Code Assessments
In this section, we present results from numerical tests to assess CANS+ by comparing with results from the second-order MUSCL scheme with the monotonized central (MC) limiter and the secondorder SSP-RK scheme (Gottlieb & Shu 1998). For specific heat ratio, gravity, and resistivity, γ = 5/3, g = 0, and η = 0 were adopted in the following tests unless otherwise stated. Dashed lines indicate the order of accuracy in space.

Alfvén wave propagation
The spatial resolution of the MP5 scheme in CANS+ was verified by 1D tests of circularly polarized Alfvén wave propagations with various numbers of computational cells per wavelength. We initially set the magnetic and velocity profiles as where we have used units of the system size L x = 1, the background Alfvén speed V A0 = 1, and the Alfvén transit time T 0 = L x /V A0 . The CFL number σ c = 0.05 was adopted in the following analyses on spatial resolutions to minimize errors from the time integration. Otherwise, the solution becomes the third order of the SSP-RK scheme under fixed CFL conditions.
Figure 2(a) shows spatial profiles of B z obtained by the MP5 (red) and the MUSCL scheme (blue). The wavelength is resolved with 16 cells. After five Alfvén transit times (t = 5), while the amplitude of the wave has decreased by 25% in the MUSCL scheme, the profile from the MP5 scheme almost overlapped the theoretical profile (black). This strong damping of the wave is an inherent property of the TVD schemes. In contrast, the MP5 scheme can detect smooth profiles and  schemes. The numerical errors increased with cell width according to the degree of the interpolation polynomial of each scheme. As expected, the dashed lines indicate that the MUSCL and the MP5 schemes have the second-(∝ ∆x 2 ) and fifth-order (∝ ∆x 5 ) accuracy in space, respectively.

Shock tube problem
The monotonicity preservation of CANS+ was verified by a standard shock tube problem (Brio & Wu 1988;Ryu & Jones 1995). We initially set for this shock tube problem the values (Ryu & Jones 1995). We used 512 computational cells for this problem. and we compared them with the MUSCL scheme (blue) and the reference result obtained by the firstorder scheme with 8192 cells (black). Characteristic profiles (from left to right: fast rarefaction, slow compound, contact discontinuity, slow shock, and fast rarefaction) were reproduced by CANS+. In particular, the contact discontinuity (x ∼ 0.56) and the slow shock wave (x ∼ 0.63) were resolved more sharply (figures 3(b) and 3(e)) than in the MUSCL scheme. A staircasing profile around the compound wave can be found when the MP5 scheme was combined with the HLLD Riemann solver.
A smoother profile was obtained with the HLL solver (not shown) as previously reported with the global Lax-Friedrichs scheme (Mignone et al. 2010).
Figures 3(c) and 3(f) compare the results from the MP5 scheme with different variables for the interpolation to cell surfaces. The reconstruction of the characteristic variables (black) that was used in CANS+ resulted in the best performance to capture the discontinuities among other variables used for the reconstruction, including the primitive variables shown in the figure (red). Interpolation of the characteristic variables, which were actually transported by the waves, was necessary for obtaining nonoscillatory results with the present fifth-order scheme.

Alfvén wave propagation
The capability of CANS+ is demonstrated with the circularly polarized Alfvén wave propagation in two dimensions. By using various numbers of computational cells per wavelength, we verified the actual spatial resolution in multi dimensions. We initially set the magnetic and velocity profiles as where λ = 1 is the wavelength, l is the coordinate in the direction of the wave propagation, and t1 and t2 are the transverse components. x-, y-, and z-components of the magnetic field (velocity) are thus initially given as where θ is the propagation angle with respect to the x-axis. The simulation box sizes are L x = 1/ cos(θ) and L y = 1/ sin(θ). The CFL number σ c = 0.01 was adopted to minimize errors from time Numerical experiments with various cell widths in figure 4(c) again shows the high capability of the MP5 scheme in two dimensions. The errors from the MP5 scheme decreased as the cell's size gets smaller following a slope expected from the fifth order interpolation. It is also worth mentioning that the numerical errors are much larger in the MUSCL scheme than in the MP5 scheme when compared at the same cell size ( figure 4(c)). In other words, the MP5 scheme can provide numerical solutions at the same accuracy with much coarser cells than those required in the MUSCL scheme.
This result has great benefits for saving numerical costs because they increase as ∆x −(n+1) in ndimensional simulations, while the number of numerical operations in the MP5 scheme increases by a factor of two from those in the MUSCL scheme (see subsection 3.3).

The Kelvin-Helmhotlz instability
Evolutions of the Kelvin-Helmholtz (K-H) instability in inhomogeneous plasma are presented as a multidimensional problem. We initially set the velocity shear profile as V x = −0.5V 0 tanh(y/λ), where V 0 was 1.6 times the Alfvén speed V A , λ was the half thickness of the velocity shear layer, and the mass density profile was set as ρ = 0.

The Orszag-Tang vortex
We present results of the so-called Orszag-Tang vortex problem (Orszag & Tang 1979) to test the code's capability of capturing shock-shock interactions and turbulence. We initiated the problem   6(b)). The overall structures seem to overlap with each other as characterized by sharp discontinuities at shock waves. Besides the overall structure, a closer look at the profile along the x-direction at y = 0.5π (figure 6(c) ) shows that the MP5 scheme (red filled circles) tended to follow a fine-scale structure of the reference run (x ∼ 0.75, cyan line) but with a oscillatory profile, whereas the the smooth profile was obtained by the MUSCL scheme (blue circles).
Figures 6(d) and 6(e) show profiles of divergence errors of the magnetic field. Here we introduced a normalized quantity defined by |B∇ · B| where F * ρvx and F * ρvy denote the numerical flux for the xand y-components of the momentum equations, respectively, to quantify the impact of the divergence errors to the dynamics (equation (43)). Because CANS+ adopts the hyperbolic divergence cleaning method, non-negligible errors (a few to 10%) persist locally, in particular at shock wave fronts in both schemes. The errors are occa- sionally comparable to or greater than the physical force in very localized regions. This caveat that the divergence errors tended to be larger at shocks must be considered in multidimensional problems.

Parker instability
Magnetized plasma stratified under the gravity becomes destabilized as a result of the buoyancy force against the magnetic tension force. This instability is known as the Parker instability (Parker 1966), and its nonlinear evolution is characterized by bent magnetic field lines in rarefied plasma (Matsumoto et al. 1988). We chose this problem as a benchmark test for evaluating the code's capability of solving low-β plasma in multidimensions. In the present test, the gravity acceleration term g on the right-hand side of equation (2) was retained.
The Parker instability was initialized with the gravity acceleration profile and the temperature The mass density was determined from the static equilibrium where β 0 is the ratio of the thermal pressure to the magnetic pressure, and p = ρT /γ. The magnetic field has only the x-component (B x ) with strength given by the plasma beta β 0 . The spacial length, velocity, and time were normalized to the scale height H 0 , the sound speed C s0 at y = 0, and the transit time H 0 /C s0 , respectively. Other quantities such as g, ρ, p, T , and B x were normalized to C 2 s0 /H 0 , ρ 0 , ρ 0 C 2 s0 , C 2 s0 /(γ/m), and ρ 0 C 2 s0 , respectively. We adopted γ = 1.05, g 0 = 1.47, H g = 5.0, H t = 0.5, y 0 = 10.0, T 0 = 1, T 1 = 25, and β 0 = 1.0.
With these initial configurations, we added a perturbation to V x in an antisymmetric form with respect to the y = 0 plane as

Magnetic reconnection
When magnetic field lines are imposed to form an antiparallel geometry, the magnetic field topology changes through reconnection of the field lines. Because it accompanies the magnetic energy conversion to the plasma kinetic energy, the magnetic reconnection has been studied extensively to understand explosive phenomena, such as flares in the solar corona and pulsar winds, and terrestrial substorms. It is also an important process in dynamo processes in accretion disks. The topology change during reconnection is characterized by bent magnetic field lines and bipolar trans-Alfvénic jets from the reconnecting "x" point, and the resulting plasmoid evolution. turbulence associated with the plasmoid evolution. To initiate the reconnection, the resistivity η terms on the right-hand side of equations (3) and (4) were retained.
The plasmoid motion in the x-direction pushes the stationary plasma away from the current sheet in 85 < x < 88 to y ∼ ±2, making the velocity shear between the swept and surrounding plasmas.
This velocity profile can be a source of turbulence through excitation of the K-H instability (Zenitani & Miyoshi 2011).

Parallel scaling and computation efficiency
Hybrid parallelization is implemented into CANS+ using the Message Passing Interface (MPI) library and OpenMP. The code has been optimized on massively parallel supercomputer systems to run effectively. We examined a parallel efficiency of the 3D code on K computer at the RIKEN Center for Computational Science, Reedbush at the Information Technology Center, the University of Tokyo, and ATERUI II at the Center for Computational Astrophysics, National Astronomical Observatory Japan.
The K computer is composed of 82,944 computation nodes, and each node has eight processor cores. The node's peak performance is 128 GFLOPS (128 × 10 9 floating-point operations per second). For this system, intra-and internode communications were respectively realized by OpenMP directives and MPI libraries.
The Reedbush and ATERUI II systems are based on many Intel Xeon CPUs. The Reedbush system is composed of 420 computation nodes, and each node has 36 processor cores (18 cores ×2 CPUs). The ATERUI II system is composed of 1,005 nodes each of which has 40 cores (20 cores ×2 CPUs). The node's peak performances are 1.2 TFLOPS (1.2 × 10 12 FLOPS) and 3.1 TFLOPS, respectively. For these systems, four MPI processes and eight OpenMP threads were used in each computation node.   Figure 9(a)). The code with the MUSCL scheme also speeds up linearly with increasing numbers of cores (93.6% efficiency, blue solid line).
With the help of the core's SIMD (single instruction multiple data) capability, CANS+ runs efficiently also with respect to the system's performance. The code with the MP5 scheme runs at 15.3% on average to the peak performance of the K computer with the tested numbers of cores, whereas the code with the MUSCL scheme runs at 11.6%; the MP5 reconstruction is an efficient scheme in terms of floating-point operations. The performance on other systems with smaller BF ratios (ratio of the memory bandwidth to FLOPS) resulted lower efficiencies of 13.9% (MP5) and 10.9% (MUSCL) on the Reedbush system, and 7.2% (MP5) and 4.8% (MUSCL) on the ATERUI II system. .6 µs/cell/core/step, respectively) and the ATERUI II (4.0 and 2.6 µs/cell/core/step, respectively). Note, however, that the ratio of the computation time of the MP5 run to the MUSCL run decreases from 12.8/6.5 ∼ 2.0 on the K computer to 4.0/2.6 ∼ 1.5 on the latest low-BF-ratio system. The performance results and corresponding system's specifications are summarized in Table 1.
Because the MP5 reconstruction implemented in CANS+ requires more floating-point operations per cell than those for the second-order MUSCL scheme, it is essentially a less memory-intensive code. In other words, the code takes less time to load the required data from the physical memory than that for floating-point operations. This is a good property for recent low BF ratio systems. In general, the high-resolution (greater than the fifth order) code is suitable for massively parallel, petascale to exascale supercomputer systems because of its high efficiency.

Application to Global Simulations of a Black Hole Accretion Disk
In this section, we present global simulations of an accretion disk around a black hole as an application of CANS+. The long-term evolution was characterized by a sharp contact discontinuity between the hot, dilute corona and the cold, dense, rotating disk, compressible magnetic turbulence via the magnetorotational instability (MRI), the resulting mass accretion (advection), and the periodic dynamo process through the Parker instability (Machida et al. 2013). All of these mechanisms were successfully solved by adopting the HLLD approximate Riemann solver, the MP5 reconstruction, and the hyperbolic divergence cleaning method in CANS+.

Initial torus model
We examined the evolution of an accretion disk in cylindrical coordinates (R, φ, z) initially given as a torus threaded by the toroidal magnetic field embedded in non-rotating, hot and dilute plasma (corona) (Okada et al. 1989;Machida & Matsumoto 2003). General relativistic effects around the black hole are modeled by the pseudo-Newtonian gravitational potential (Paczyńsky & Wiita 1980), where r is radial distance from the black hole in spherical coordinates, G is the gravitational constant, M is the black hole mass, and r S is the Schwarzschild radius. Then the gravitational acceleration was obtained by g = −∇Φ.
The torus was initially given by setting a density profile (Nishikori et al. 2006) as where R 0 , ρ 0 and β 0 are the values at the center of the torus at which the mass density has a maximum value, K = p t /ρ γ t characterizes the polytropic relation between the gass pressure p t and ρ t inside the torus. L is the specific angular momentum in a functional form of where L 0 is the value of the Keplerian flow at R 0 and a is a constant. Ψ 0 is a potential energy at R = R 0 given by where Φ 0 is the gravitational potential at r = R 0 . This potential energy is a constant provided the initial toroidal magnetic field is in the form of (Okada et al. 1989) The mass density for the corona was given by (Nishikori et al. 2006) where C sc = γp c /ρ c is the characteristic sound speed in the corona. The total mass density is then given by ρ = ρ t + ρ c .
The periodic boundary condition was applied in the φ-direction, whereas the free boundary condition was applied in the z-direction and the outermost region in the R-direction. All physical quantities were absorbed inside the spherical region of r ≤ 0.2. This was accomplished by damping deviation of a physical quantity, q, from the initial state q 0 with a damping rate a i as (Machida & Matsumoto 2003)  and t = 45.0 (last column) are shown for the mass density (top row) and φand R-components of the magnetic field (middle and bottom rows). Profiles at the equatorial and meridian planes passing through the black hole are projected on the x-y, x-z, and y-z planes.

MP5 vs. MUSCL schemes
Using the same initial setup, we compared results from the MP5 and MUSCL schemes. Figure 10 shows the time evolution of the accretion disk solved by the MP5 scheme. Inside the torus threaded by the azimuthal magnetic field, the MRI exponentially grew (figures 10(a), 10(d), and 10(g)) until t = 20. The Maxwell stress (B R B φ ) generated by the MRI (figures 10(d), 10(e), 10(g), and 10(h)) enhanced outward momentum transport, resulting in continuous mass accretion into the black hole (figures 10(b) and 10(c)).
After nonlinear saturation of the MRI, the poloidal component of the magnetic field was cre- ated via the Parker instability, allowing escape of magnetic flux to the corona, which in turn caused a reversal of the sign of the azimuthal component inside the disk (Machida et al. 2013). This reversal of the toroidal magnetic field occurred periodically during its long-term evolution in the simulation run with the MP5 scheme. Figure 11(a) shows a butterfly diagram for the azimuthal component of the magnetic field averaged in the region 0.2 ≤ r ≤ 0.5 and 0 ≤ φ ≤ 2π. Reversals of the sign of the magnetic field at the disk center occurred after the preceding buoyant motions of the magnetic flux due to the Parker instability. This dynamo process persisted during the simulation run up to t ∼ 40.
When we examined the same problem with the MUSCL scheme, such a dynamo process occurred in the late phase of the evolution, as shown in figure 11(b). The TVD property of the MUSCL scheme evidently inhibited linear and early nonlinear growths of the MRI. This drawback is also visually recognized in figure 12, in which magnetic turbulence suffered from strong numerical damping in the early stage (figures 12(a) and 12(d)), and global-scale magnetic field survived in the late nonlinear stage (figures 12(c) and 12(f)).
where r 0 = 0.2 and θ are the radius at the inner boundary and the elevation angle in spherical coordinates, respectively. The mass accretion rate peaked at t ∼ 19 and gradually decreased and . turbulence in the disk, which can be quantified by the so-called α-parameter (Shakura & Sunyaev 1973) where <> stands for the volume-weighted average of the quantity in cylindrical coordinates. Figure   13(b) compares time histories of the α-parameter averaged in a region, 0.2 ≤ R ≤ 0.5, 0 ≤ φ ≤ 2π, and −0.25π ≤ arcsin(z/ √ r 2 + z 2 ) ≤ +0.25π. As shown in the mass accretion variation, the α-parameter was sustaind at a level of α ∼ 0.015 following an initial peak of α ∼ 0.1 at t = 19 in the MP5 run (red line). The α parameter variation from the MUSCL run (blue line) also exhibited a similar profile as in the MP5 run but with a peak in the late stage as was found in the mass accretion rate variation.

Convergece test
We have examined with the MP5 scheme under different spatial resolutions to assess the numerical convergence property for the present particular problem. The numerical parameters are summarized in Table 2 for each spatial resolution.    the same format as figure 13(b). Whereas overall the profiles looked similar among runs with different resolutions, the initial growth of the MRI is slow in the low-resolution run (dotted line), as was found in the run with the MUSCL scheme (blue line in figure 13(b)). We could obtain a convergent result of the MRI growth in the early stage up to t ∼ 10 with medium-(dashed line) and high-(solid line) resolution runs. Nevertheless, the spatial resolution used in the high-resolution run is not high enough to give convergent results for further long-term evolution in t > 10.

Summary and Discussion
We have developed CANS+, a high-resolution, numerically robust MHD simulation code by employing the HLLD approximate Riemann solver, the MP5 reconstruction method, and the hyperbolic divergence cleaning method. We performed a number of benchmark tests to show the code's capability for solving discontinuities, shock waves, and turbulence all of which are essentially important in astrophysical situations.
In 1D benchmark tests that included linear Alfvén wave propagation and shock tube problems, the adoption of a spatial fifth-order scheme gave superior results compared with a second-order scheme, even when the additional computational costs arising from the higher-order reconstruction were considered: The computation time increased by two times compared with the second-order scheme, but for the same grid resolution the numerical errors from the fifth-order scheme were smaller by orders of magnitude. In other words, to obtain solutions with the same accuracy, the fifth-order scheme required smaller numerical costs by orders of magnitude than the second-order scheme. This advantage is more prominent in multidimensional problems.
In 2D tests of the oblique Alfvén wave propagation, the K-H turbulence, the Orszag-Tang vortex problem, and the magnetic reconnection, it was shown that CANS+ enables solving discontinuities, shock waves, and turbulence with high accuracy and stability simultaneously. The test problem of the Parker instability also showed a high capability for solving very low-β (∼ 10 −3 ) plasma in which the numerical divergence errors of the magnetic field were maintained within reasonably low levels.
As an application of CANS+, we presented global simulations of an accretion disk around a black hole. With a given initial setup and grid resolution, CANS+ was capable of following the long-term evolution of the accretion disk in which the MRI and the resulting mass accretion into the black hole were sustained for 50 rotational periods. By increasing spatial resolution, we could obtain a convergent result of the early nonlinear growth of the MRI. The low-resolution run with the MP5 scheme gave a similar result to that in the medium-resolution run with the MUSCL scheme. Again, in practice, the MP5 scheme has at least twice the resolution of the MUSCL scheme, giving more than eight times gain in computation time (more than 2 4 times speed up with doubled computational costs) to obtain results with the same accuracy.
Lastly, we address the caveat of using fifth-order numerical schemes. The characteristic variables used for the reconstruction step showed the best performance with nonoscillatory results in the 1D shock tube problem. With other parameters, such as primitive variables, the updated conservative variables profiles were subject to numerical oscillations around discontinuities even if the MP5 reconstruction gave nonoscillatory profiles at cell surfaces. Thus, the reconstruction step not only required high computational costs because of the variable conversion, but also introduced difficulty in analytically obtaining the eigenvectors and the corresponding eigenvalues of the system. This can be problematic when extending the present MHD to, for example, the special relativistic MHD equations, in which the second-order TVD schemes have been adopted Takahashi & Ohsuga 2013). The search for variables that are universally applicable to high-order reconstruction in various systems of equations remains a task for further applications of CANS+.

Appendix 1 MP5 reconstruction in nonuniformly spaced cells
A piecewise fourth-degree polynomial is used in the MP5 reconstruction. For the uniformly spaced cells, the left state of a quantity f at a cell i + 1/2 is, for example, given by For more practical uses, in which the cell size is not necessarily uniform, CANS+ employs the Lagrange polynomial as to represent the spatial integral of the quantity f Thus, f can be obtained by taking the derivative of equation (A4) f Finally, the coefficients C r in equation (A1) can be obtained for x i+1/2 from f (x i+1/2 ) as Note that C r depends only on the cell size ∆x i+n and the cell surface's location x i+n/2 . Thus, they can be determined at initialization.

Benchmark test
We examined a circularly polarized Alfvén wave propagation in two dimensions using nonuniformly spaced cells as a benchmark test of the interpolation procedure of equation (A7). The initial setup is the same as that presented in sub-subsection 3.2.1, except we adopted the propagation angle of θ = 45 • . In this test, we used the following two different cell sizes where L x,y is the system's size in the xand y-directions.
By examining with different sizes of ∆h c , the accuracy of the code in nonuniformly spaced cells was obtained. We calculated the L 1 norm error by the following equation where (i, j) indicates the cell number in two dimensions and B 0(i,j) is the analytic solution at each cell-center location.  Here, we also plotted results from uniform cell cases (green line). We found that the error decreased following the slope expected from the fifth-order interpolation for ∆h c /λ > 0.1. We also note that the 37 errors are smaller in the nonuniform cases than in the uniform cases. However, the accuracy curve approaches the slope of the second-order accuracy for ∆h c /λ ≤ 0.1 in the nonuniform cases. The accuracy curve for the uniform cases followed the fifth-order slope all the way down to ∆h c /λ ∼ 0.01, as expected from the results in sub-subsection 3.2.1.
Appendix 2 MP5 reconstruction in cylindrical coordinates Mignone (2014) showed that incorporating the curvature of the cell into the piecewise polynomial reconstruction in curvilinear coordinates, namely, the volume-weighted reconstruction, improved the solutions near the origin of the coordinate axis (along the z-axis in cylindrical coordinates). This curvature effect is considered for the MP5 reconstruction in the R-direction in cylindrical coordinates.
In this case, equations (A4) and (A5) are modified as As for Cartesian coordinates, the coefficients C r are similarly obtained for R i+1/2 from f (R i+1/2 ) = 1 R i+1/2 dF dR | R=R i+1/2 as Note that the coefficient for the right state of the cell surface at R = 0 becomes infinity. This singularity at R = 0 is addressed by adopting the first-order reconstruction, i.e., R f 0 =f 1 .
Boundary condition at R = 0 In the following, we present results from benchmark tests to discuss how the two interpolation procedures of equations (A7) and (A14) result in different evolutions in cylindrical coordinates.
For boundary conditions across the z-axis, ρ, p, and (V z , B z ) were assumed axisymmetric profiles, whereas the antisymmetric boundary condition is applied to (V R , V φ , B R , B φ ) for the case with equation (A7) as usual. Conversely, with equation (A14), the symmetric and antisymmetric boundary conditions were applied to (V R , V φ , B R , B φ ) and (V z , B z ), respectively. We found this somewhat odd boundary condition resulted a best practice with equation (A14) in the following experiments. A naive idea behind this is that a cell volume R i ∆R i becomes negative in R < 0 by definition (equations (A12) and (A13)), and the sign of the vector quantities should be reversed when constructing the volume-weighted polynomial. Note, however, that the usual boundary condition should be used for other situations, such as when calculating the current density (Appendix 3). The special care for the boundary condition in cylindrical coordinates was also addressed in global simulations of a black hole accretion disk in Section 4.

Benchmark tests
1D magnetic confinement of a cylindrical plasma column First, we examined a static balance problem presented by Mignone (2014). In this test, a cylindrical plasma column was initially confined by the toroidal magnetic field of B φ , satisfying a radial force balance between the gas pressure (p) gradient and the Lorentz forces, as shown in the following equations p = p 0 (1 + R 2 /R 2 0 ) 2 , (A15) where p 0 = 1 at R = 0 and B φ = 1/ √ 2 at R 0 = 1. The initial cell-averaged quantities were numerically obtained by Simpson's rule.
The solutions were obtained in a 1D domain in 0 ≤ R ≤ L R for different spatial resolutions at a normalized time of t = 10. Here, the domain size was L R = 10, and we used 32 to 1024 computational cells to obtain the spatial accuracy of the code. The CFL number was fixed to σ c = 0.05.
For the present cylindrical case, the L 1 norm error was calculated by where ∆ cyl is the cylindrical volume of a 1D domain 0 ≤ R i ≤ R max : and i indicates the cell number. To highlight the differences between the two schemes, we adopted R max = 1.0. Obviously, V R (t = 0) = 0 in the present static balance problem. The error with the interpolation of equation (A7) is, however, about one order of magnitude larger than the error of the interpolation incorporating the cell's curvature effect (equation (A14)) in all spatial resolutions. We found great improvement in the solution, especially near the coordinate origin.

2D blast-wave propagation
Next we demonstrate a blast-wave propagation in a 2D (R-z) configuration. The initial setup is the same as that introduced by Mignone (2014), and constant background quantities were given as where C s0 = 4 × 10 −3 is the sound speed in the ambient medium. Inside the spherical region of 0 ≤ √ R 2 + z 2 ≤ 1.0, ρV R R 2 = 1.0 (A22) where C s1 = 3 × 10 −2 is the sound speed in the wind region. The initial cell-averaged quantities for these configurations were numerically obtained by Simpson's rule. The profiles in the spherical region were set constant in time so a supersonic flow blows out from the inner spherical region. Initialized by this setup, the time evolution was solved in a simulation domain that covered 0 ≤ R ≤ 10 and When using equation (A7) for the reconstruction in the R-direction, the forward shock was subject to a spurious deformation in addition to a trail in the rarefied region along the z-axis ( figure   17(a)). The growth of the numerical artifacts near the z-axis ceased and the solution was greatly improved ( figure 17(b)) by incorporating the curvature effect of the cell into the reconstruction (equation (A14)). Turbulent evolution of the Rayleigh-Taylor instability at the contact discontinuity between the sharp shock wave fronts highlights the superior capability of CANS+, which employs the MP5 scheme and the HLLD approximate Riemann solver.

Appendix 3 Accuracy of the resistivity terms
In this section, we present a convergence test of the magnetic diffusion problem presented by Matsumoto (2011) to evaluate the source term representation associated with a finite magnetic resistivity (equations (3) and (4)).
The current density j = ∇ × B is evaluated by the second-order central finite difference, for example, where ∆y is the cell size in the y-direction and (i, j) indicates the cell number in two dimensions.
Then, the current density at the cell surface is obtained by the arithmetic average of the two neighboring cell-center values and is added to the numerical flux F * of the ideal MHD part Here, we only focus on the diffusion term in the induction equation (F * = 0) This equation was solved in a 2D plane in 0 ≤ x ≤ 1 and 0 ≤ y ≤ 0.5 with the periodic boundary condition in each direction. The z-component of the magnetic field was initially provided by where k = 2π(1, 2) T and r = (x, y) T . The analytic solution to the present diffusion problem can be obtained as B z (t) = exp(−η|k 2 |t) sin(k · r), with which L 1 norm errors were calculated at a time of t = 4 × 10 −3 with various cell sizes from ∆x = ∆y = 1/8 to ∆x = ∆y = 1/256 under a fixed diffusion number of σ d = 0.3. We adopted η = 1.0. Figure 18 shows the L 1 norm errors as a function of various cell sizes. As expected, the errors decreased as the cell size gets smaller, following the second-order accuracy slope. Nevertheless, the resistivity terms become important in localized regions inside current layers (e.g., the magnetic reconnection region shown in sub-subsection 3.2.5), and the impacts of the lower-order representation to overall spatial accuracy are generally limited.