SADHANA: A Doubly Linked List-based Multidimensional Adaptive Mesh Refinement Framework for Solving Hyperbolic Conservation Laws with Application to Astrophysical Hydrodynamics and Magnetohydrodynamics

We report the development of a nested block-structured adaptive mesh framework to solve multidimensional, time-dependent hyperbolic equations encountered in astrophysics. An approach based on a tabular list is used to construct variants of Hilbert space-filling curves in an iterative fashion to maintain the connectivity of locally refined mesh configurations using a doubly linked list. Modifications are made to conventional boundaries of computational blocks to aid the adaptive mesh. We also describe a well-defined, computationally efficient data structure to hold self-similar mesh units for this purpose. The flexibility of this code is demonstrated by the performance of various Riemann solvers implemented in this computational framework.


Introduction
The search for a balance between ever-increasing demand for higher resolutions of numerical solutions and available computational power has resulted in the rise of the study of adaptive mesh refinement (AMR) algorithms on multiple CPUbased computational units. A successful implementation of the AMR algorithm can distribute the computational cost efficiently at discrete locations in the domain where higher resolutions are required. Since these discrete locations are usually governed by time-dependent local solutions, the required adaptive character of the mesh has to be dynamic and capable of changing with the evolution of the global solution itself. To achieve this, light-weight signatures of the complete or partial mesh structure are usually stored in a dynamic fashion to control the global mesh. These light-weight signatures are often termed proxy-data or meta-data, and the mesh adaption is performed using a replication and/or distribution of these proxy-data among different processes. In some cases, particularly when the requirement of higher resolution is scattered sparsely all over the domain, the task of mesh distribution among various process may lead to a considerable increase in computational cost, and therefore, these mesh-adaption techniques need to be designed with care and caution. In practice, some form of indicator function is used to identify the regions in which mesh adaption is needed, and space-filling curves (SFCs), constructed from the proxydata, are used to decompose the computational load among multiple processes. Although numerous variations exist, AMR for structured mesh or structured adaptive mesh refinement (SAMR) methods can be divided into three main categories, namely patch-based, cell-based, and block-based AMRs. The patch-based AMR was introduced by Berger & Oliger (1984) and Berger & Colella (1989) in their pioneering work on the AMR algorithm for Cartesian meshes. The technique can be extended to multiple space dimensions, and the data structure required for this approach is explained in detail in Berger (1986). The computation begins with a coarse Cartesian grid that is considered to be the root-level mesh. Afterward, computational cells are marked for refinement as the calculation progresses. These newly refined individual cells are organized and maintained in different patches, and these cells are further refined, depending on requirements, creating additional nested patches. In a cell-based AMR, each cell may be refined individually and is stored using a tree-type data structure (quadtree for 2D and octree for 3D). This version of an AMR was again pioneered by Berger & Leveque (1989) and has been adopted, modified, and improved by various studies, including Powell et al. (1993), DeZeeuw & Powell (1993), Coirier & Powell (1995), Aftosmis et al. (1998), and Aftosmis et al. (2000). The flexibility of a tree-type data structure readily accounts for the local refinement of the mesh by keeping track of the computational cell connectivity as new grid points are generated by the refinement process (four and eight children cells in 2D and 3D, respectively). Most of these cell-based approaches have so far been restricted to a numerical solution of hyperbolic systems on Cartesian meshes, mainly because elliptic operators are more challenging to implement in this type of mesh structure. In a block-based AMR strategy, commonly known as block-structured adaptive mesh refinement (BSAMR), mesh adaptation is accomplished by the dividing and coarsening of entire predefined mesh blocks or groupings of cells. Although not required, each of the groups or blocks generally has an equal number of cells. As in the case of a cell-based AMR, tree-type data structures are commonly used for tracking the block connectivity and mesh refinement. However, the tree structure needed in this case is lighter (i.e., it needs less information and is hence computationally cheaper) compared to cell-based AMR methods. As such, even considering the typical larger numbers of mesh cells created during the refinement process (i.e., typically more than the corresponding number of cells introduced in cell-based AMR approaches), block-based AMR methods are arguably easier to develop and maintain in terms of computational efficiency and scalability in parallel implementations via domain decomposition, as discussed in Groth et al. (1999).
There exist various software frameworks, each of which provides an AMR facility for a structured mesh, with multiple options for the detector function and load balancing. To name a few, BoxLib 3 (Zhang et al. 2016), Chombo 4 (Adams et al. 2016), PARAMESH (MacNeice et al. 2000), and p4est library (Burstedde et al. 2011) are widely used general-purpose SAMR software packages. The key differences among these codes are based on the choice of SFC, on the design of inherent data structures and algorithms to control proxy-data-related tasks among different processes during AMR operations. BoxLib, now succeeded by AMReX 5 , is one of the oldest computational libraries and is devoted to building high-performance AMR applications. Chombo started as a fork of BoxLib in 1998 and eventually became a separate multipurpose package itself. Various codes used in the astrophysics community have started using these software libraries to add an AMR facility in the simulation. For example, the FLASH code (Dubey et al. 2009) has mesh-refinement support provided by the octree-based PARAMESH package, while Chombo is used in the finitevolume code Pluto (Mignone et al. 2012). Another numerical framework that has gained considerable interest in the astrophysical community is Cactus (Goodale et al. 2003), which uses Carpet 6 (Schnetter et al. 2004) for its AMR operations.
Although the use of external frameworks makes it possible to implement an AMR facility in a particular code without any need of extreme care for the AMR strategy itself, the solver that is considered ends up having a strong dependence on the data structure that is used in these packages. As a result, a significant amount of time and effort is needed to go through and understand these aforementioned data structures before a specific AMR package can be used efficiently. Also, the code needs to be updated regularly to keep it compatible with newer versions of the AMR framework that is used. To avoid this hindrance, some codes, including Enzo (O'Shea et al. 2005;Bryan et al. 2014), RAMSES (Teyssier 2002), and NIRVANA (Ziegler 1999(Ziegler , 2008, have integrated their own AMR algorithms in these solvers. The development of the Enzo-P/ Cello (Bordner & Norman 2012) code is a good example to show complications associated with a stand-alone AMR code development even with an extremely capable group of developers because it is a modification and redesign of Enzo, where special care is given to overcome scaling bottlenecks. A full review of all AMR codes is beyond the scope of this paper. We refert to Dubey et al. (2014) for a study of various SAMR packages, their design, performance, and limitations.
The key contribution of the current work is in the direction of AMR-based code development without any major dependence on external libraries. Also, our work differs from most of other related studies in the way we define the neighbors, and how, instead of using an explicit and well-optimized tree-type data structure, we use doubly linked lists to organize and maintain our AMR mesh. The doubly linked list data structure allows us to implement Hilbert-type SFCs in an implicit fashion to provide a computationally efficient dynamic loadbalancing performance according to the mesh adaption for numerical solution. The current paper is organized in the following fashion: we first briefly review the governing equations to be used for the targeted physics suitable for our solver in Section 2; afterward, we give a detailed description of the SADHANA AMR solver in Section 3, followed by Section 4, where we consider various conventional test-cases to show the practicality and versatility of the SADHANA AMR framework and test its computational performance. Finally, in Section 5, we summarize the work done and provide concluding remarks.

Governing Equations
SADHANA aims to solve a generalized system of hyperbolic equations with source terms in multidimensional form. Strictly speaking, in 3D Cartesian coordinates, the PDE system in consideration is where x 1 : = x, x 2 : = y, and x 3 : = z are the directions, N D is dimension of the problem, U is a vector holding conservative states of dependent variables, F x i is the directional flux vector, and finally, S is a vector holding all the source terms to be considered. The value of U is computed from the corresponding Q vector holding the primitive states of the independent variables following a physics-based equation of state (EOS). Detailed structures and values of Q, U, F x i , and S are directed by the physics of the problem in consideration. We now discuss how different types of governing equations can be implemented in the system expressed by Equation (1) to account for typical hydrodynamics and magnetohydrodynamics (MHD) problems.

Hydrodynamics
In an inviscid hydrodynamics system, the primitive and conservative state vectors, directional flux vectors, and the source vector are defined as It should be noted that while we have ignored gravity and any other possible forcing function, if needed, we can easily modify the source vector for a specific problem (e.g., where gravity is needed, or when we need to consider viscous effect). Following conventional symbols in fluid mechanics, the density, velocity, and thermal pressure are represented by ρ, x y z T , and p t , respectively. The total energy E can be calculated as E = ρe in + E kin , where the specific internal energy e in is obtained from the adiabatic EOS for ideal gas, i.e., ( ) = r g -e p in 1 t , with γ being the adiabatic index, and the kinetic energy E kin is calculated as | | ( )

Magnetohydrodynamics
Because SADHANA at the time of writing is developed to deal with cell-centered quantities alone, the classical numerical approach for an MHD system is difficult to apply in the solver. This is due to the conventional approach of storing the magnetic field , , x y z T in a staggered arrangement so that it is numerically easier to keep a divergence-free condition, i.e., ∇ • B = 0 is satisfied throughout the calculation. Fortunately, much effort is made to prepare reliable algorithms for a numerical treatment of MHD in nonstaggered grids, including Chacón (2004), Toth (2000, Rossmanith (2006), Helzel et al. (2011), andDedner et al. (2002). We have chosen to follow the extended generalized Lagrange multiplier (EGLM) formulation of MHD equations as pioneered by Dedner et al. (2002), due to its simplicity, where a scalar field ψ is evolved to maintain the divergence-free condition. As such, we have primitive and conservative state vectors, directional flux vectors, and the source vector of the EGLM-MHD system, defined as x y x y xy Just as before, we can modify the source terms to consider any nonideal term (e.g., magnetic resistance) and other forcing terms (e.g., gravity or viscous effects). The magnetic field is scaled by m 0 , with μ 0 being the magnetic permeability of a medium. The total energy of the MHD system is E = ρe in + E kin + E mag , with the magnetic energy = E mag The values of c p and c h are adjusted by the user to account for the characteristic propagation speed and the damping rate. For an in-depth description of the EGLM-MHD system with a detailed explanation of every term, we refer to Dedner et al. (2002).

SADHANA Adaptive Mesh Refinement Framework
SADHANA uses a simple but versatile data structure to control the nested-mesh system, which is dynamically adapted and modified as per requirement during computation. The main structure of the code is maintained in a modular and optimized fashion such that adjustments can easily be implemented to account for specific physics problems, as described in Section 2. We now proceed to describe the AMR implementation in SADHANA. The general configuration of the nested block-structured mesh is explained in Section 3.1, followed by a short discussion of new definitions of the block boundaries used for our work in Section 3.2. The related data structure, SFC generation, and load-balancing procedures are discussed in Sections 3.3, 3.4, and 3.5, respectively. The procedure of checking the refinement criteria for AMR and associated numerical adjustments are given in Sections 3.6 and 3.7. Finally, Section 3.8 gives a detailed description of the physicsindependent basic numerical approach, including flux calculation and time integration to solve the system of equations.

Nest of Blocks
At any particular time, the domain is covered by a nest of multiple blocks that can have a user-defined number of computational cells (by default, we set it to 10) at each direction. As such, each block ends up with the same {nx, ny, nz} numbers of mesh in the {x, y, z} directions represented by {i,j,k} integers for loop operations. If needed, these blocks can be divided into fixed numbers (2, 4, and 8 for 1D, 2D, and 3D AMR, respectively) of child blocks. The division is done in such a way that we have exactly the same physical dimension for any two child blocks that have the same mother block (i.e., sibling blocks). This operation is repeated in an iterative fashion based on some criteria, as explained in Section 3.6, to provide the required resolution at various locations of interest. Owing to the fact that the individual cell size of any block with a certain level of refinement (LOR) is constant throughout the simulation and because the number of mesh at each block is kept at fixed values regardless of its LOR, we can define the refinement ratios between mesh intervals of any two LOR values of (l − 1) and l as Note that the LOR of each child block is one level higher than that of its mother block. Finally, the nest of blocks with different refinement levels is organized in such a fashion that the maximum value for the difference in LOR between any two neighboring blocks is 1, and thus, we always have at least one LOR = (l − 1) block between a LOR = l and a LOR = (l − 2) block.

Modified Definition of Block Boundaries
In a conventional numerical approach, for any computational volume, only two boundaries are considered in each direction, i.e., LEFT(Ω Lft ) and RIGHT(Ω Rht ) for x, TOP (Ω Top ) and BOTTOM(Ω Bot ) for y, and finally, FRONT(Ω Frn ) and BACK (Ω Bck ) for z. Only the neighboring blocks with the same LOR value are connected because the block interface covers the block boundaries completely in these cases. Because we only perform the computation on leaf blocks (i.e., blocks without a child), whenever there is any need of communication between two neighboring leaf blocks, we first check the neighbor at the same LOR, and if it is not a leaf block, we then iterate through its mother/child to access the actual leaf block. While the use of the optimized tree-type data structure makes this leaf-search efficient, in this paper, we follow a different approach altogether. We now discuss some key arguments and explain why some modifications in the definition of block neighbors can be a welcome step to simplify the neighbor leaf block search. We consider two neighboring blocks A and B that share the boundaries W , where β = 0.5 for a 2D AMR and β = 0.25 for a 3D AMR.
Note that we always need to iterate through mother/child for , which is always true for Case 2 and Case 3 following the conventional definitions for boundaries. As such, considering the 2:1 refinement, we modify these definitions such that remains true regardless of the LOR difference between A and B.
For a 2D AMR, we divide each of the Ω Lft , Ω Rht , Ω Top , and Ω Bot into two equal parts. We therefore have the following set of eight new boundaries: For a 3D AMR, we obtain a total set of 24 new boundaries by dividing each of the Ω Lft , Ω Rht , Ω Top , Ω Bot , Ω Frn , and Ω Bck into four equal parts. In a similar way as before, the division is done in such a way that each new boundary covers a single and different corner of the original boundary itself.  Figure 1 gives a schematic presentation of a 2D AMR block with these modified definitions. In Figure 1 we also show the conventional definition of block boundaries for comparison. It should be noted that depending on the physics and/or numerical methods, we may need to consider the complete neighborhood of a block for 2D and 3D problems. As a result, we need to consider four boundaries in the corners for the 2D case, and eight boundaries in corners and 24 boundaries along the edges (each edge would lead to two boundaries, as discussed before) for the 3D case.
As explained in the next sections, child blocks are generated from each mother block that undergoes the refinement procedure. The neighbors along boundaries for these newly created blocks are needed to be updated as soon as they are created. Following the child id for each new block, these boundaries can be identified in an iterative fashion. When the neighborhood of the mother block is already defined, this process can be directly derived from the location and position of each child block in the inner domain of the mother block. For simplicity, we restrict ourselves to a 2D scenario and proceed to provide a detailed list of boundaries for all children. . Similar relationships can be found for the third case by following the child id and the child position in the mother block as well.

Block_Unit: Data Structure for Single Block Information
We now focus our attention on the data structure to be used for storing information of individual blocks. The requirement for this structure is such that every individual block should be responsible for the same amount of computational load, regardless of its LOR. Figure 2 presents the Block_Unit structure for a 2D AMR application. Because most of the time, independent parameters such as the numbers and size of the cells in different directions and the size of the physical domain associated with the block are either constant or depend only on the LOR, we do not need to save them explicitly in this data (c) Mother-children AMR boundaries following the conventional definition. In the case of panel (b), we show the mother block (0) and its children (1, 2, 3, and 4). 0 Ω represents the mother-block boundaries, while i Ω with i = 1, 2, 3, 4 are the boundaries of its children.

structure. Block boundaries being [ ] ,
i i x min x max for ith directions are saved explicitly. Following the structure of doubly linked lists, each Block_Unit has two pointers. They are directed toward the previous and the next member in a doubly linked list consisting of leaf blocks at any time during the simulation. Every time we perform a refinement/derefinement process, this doubly linked list must be updated following the appropriate ordering, as explained before. Similar types of pointers (directed toward another Block_Unit structure) are used to maintain information regarding the boundaries of these blocks. A set of array pointers are used to hold the conservative and primitive variables as well as the interface fluxes for the computational cells inside these blocks. Integer-type variables are used to hold the values for the refinement level, the current processor to which the block belongs, the SFC value (explained in detail in the next section), the refinement/derefinement flag, and the index as child for the current block from its mother block.

Doubly Linked List and Generation of Nested Hilbert SFC
We now describe our approach to preparing a facecontinuous Hilbert SFC from the active blocks of different levels. One of the unique features in our work is the complete independence of the requirement of having a well-optimized explicit tree data structure, unlike most other AMR codes. We take advantage of the fact that after refinement, the quadrants/ octants of a computational block become the children (similar to the logic behind octree/quadtree data structures). First, we assign unique integers to each child of a mother. The locations of each child block along with the associated numbers used in this study (1D, 2D, and 3D) are given in Table 1. Now, we define different sets of specific sequences using these child numbers and call them "base patterns". When the centers of children are connected according to these sequences, a graph is formed representing a variant of a first-order Hilbert curve. Basically, these base patterns are indicators of the linked-listbased connectivity among the children, i.e., how the previous and next Block_Unit pointers would be prepared for the child blocks.
For example, the 2D type-I base pattern is defined as (1 → 2 → 4 → 3), which means that the first-order Hilbert curve for this pattern can be retrieved by connecting the centers of Child 1 to Child 3 to Child 4 and to Child 2. Similarly, we end up with 2, 4, and 12 different variants of the basic patterns in 1D, 2D, and 3D, respectively, which are described in Table 2.
We now demonstrate how the refinement is performed with our algorithm. Consider the pre-and post-refinement situation shown in Figure 3. We assume that we have done the first level (l = 1) of refinement and a Type-II pattern has been assigned to current 2x2 blocks. As such, the Hilbert curve is presented by the sequence The dimension corresponding to our global domain of interest is [L x , L y ]. When the child block with ID = 4 is selected for refinement by the predefined criterion (see Section 3.6 for details), we need to prepare a modified Hilbert curve for this configuration. To do this, we first assign a new pattern to this localized zone of dimension ⎡ ⎣ ⎤ ⎦ , 3 4 x y and then merge it with the base pattern of the previous level to create the new Hilbert curve. The type of this new base pattern should be selected in such a way that the local direction of the curve does not change. Before we continue with our example, we explore our observation that the choice for the type of pattern, to be assigned at any particular child after refinement, depends on the type of pattern in the previous level and on the number of this specific child where the refinement operation is done. We can thus set up a list of all possible choices corresponding to a specific SFC-type and child for this new pattern. When the Hilbert SFC is considered, the possibilities are given by Table 3 for 1D, 2D, and 3D cases. Returning to our example, we now select the Type-II pattern for the 2D AMR as given by Table 3 for the current scenario. Finally, we merge the patterns together by following the new sequence, where the subscript and superscript represent the child number and refinement level. Note that this process would change according to the pattern type for first level (l = 1) of refinement. This process can be recursively repeated to create a set of underlying self-similar patterns that can form nested Hilbert curves of an arbitrary order. Whenever we refine a block in the implicit nested tree, we simply push the ordering of child blocks in the original tree. Afterward, using these basic patterns at each active level, a sorted index list is prepared for all the active (or leaf) blocks. This sorted index list is then used to distribute the blocks among various processes. The algorithm and corresponding data structure described here is based on three building blocks: (i) Select an SFC with self-similar property and set up the basic pattern for the same, (ii) select the number of children to support the chosen SFC, and (iii) set up a , the children are centered at [ ] x ic , with the mother block dimension in x i direction being L xi , and the mother block is centered at [ = x L 0.5 i x i ]. i = 1 for 1D, i = 1,2 for 2D, and i = 1, 2, 3 in 3D.  Table 3. This approach can easily be used for different SFCs and with slight modifications to the data structures (e.g., adding another parameter to our lists, namely, the current level).

Parallelization and Load Balancing
At present, parallelization in SADHANA is only done by means of a message-passing interface (MPI), and we have no immediate plan to implement any facility for open multiprocessing (OpenMP). Before the simulation starts, we construct an initial SFC map on a single process until a userdefined LOR is achieved (note that this initial maximum LOR is different than the maximum refinement to be allowed during the simulation). Afterward, domain decomposition is done following this SFC, which is then used to distribute the computational workload to different processes for parallel computation on distributed memory architectures. When this is done, we prepare and assign the field variables based on a userdefined initial condition. As such, each process evolves the simulation in a subset of the full domain, which is defined by a corresponding subset of the 1D block list already prepared and sorted as discussed before. Each of the processes only saves the time-dependent variables associated with the subdomain assigned to the same process alone. At any particular time, data for a given block belong to only one process, and thus the decomposition remains fully parallel among multiple processors. At any point of time, suppose we have a total number of N b blocks to be distributed over N p processors. As such, we have n and (n + 1) number of blocks distributed over p 1 and p 2 processors, respectively, such that N b = n * p 1 + (n + 1) * p 2 , where N p = p 1 + p 2 . The maximum possible load difference between two processors at any given time is ΔN = nx * ny * nz, which is basically the computational load of a single block, and for the ideal case, we obtain p 2 = ΔN p = (N p − p 1 ) = 0

Refinement and Derefinement Operations
The refinement/derefinement operation is performed using a consecutive series of substeps. Each block has two flag variables for mesh-update and mesh-refinement operations. The value of the update flag can be 0 (no update) or 1 (update needed), while the refinement flag can be −1 (derefinement), 0 (no refinement), or 1 (refinement). The default values for all these flags are set to 0. Refinement or derefinement is performed only if the mesh-update flag is set to 1. First an identification step is performed in an iterative manner over all leaf blocks to mark the required zones for refinement. In this work, we have used a second-order gradient-based error indicator function following Jameson et al. (1981) I  II  III  III  V  V  X  X  IX   II  III  I  I  XI  XI  VIII  VIII  VI   III  I  II  II  VII  VII  IV  IV  XII   IV  VI  V  V  IX  IX  XII  XII  III   V  IV  VI  VI  I  I  VII  VII  XI   VI  V  IV  IV  X  X  II  II  VIII   VII  VIII  IX  IX  III  III  XI  XI  V   VIII  IX  VII  VII  XII  XII  VI  VI  II   IX  VII  VIII  VIII  IV  IV  I  I  X   X  XII  XI  XI  VI  VI  IX  IX  I   XI  X  XII  XII  II  II  V  V  VII   XII  XI  X  X  VIII  VIII  III  III  IV where ò is a nonzero number of value 10 −20 just to prevent a division-by-zero situation. We evaluate the value of η against two user-defined tolerance factors, namely tol ref and tol deref . If the value of η in any cell of a block becomes greater than tol ref , then the value of the mesh refinement flag is associated with the current block and its neighbors are all set to 1, tagging them for refinement. If η is less than tol deref for all cells in the block and its sibling blocks (which are created from the same mother block), then the value of the mesh refinement flag is set to −1 such that we get back the mother block after the derefinement operation. When the marking is completed, we iterate over all active blocks first to perform derefinement and replace the entries corresponding to all sibling child blocks in the linked list by their mother during the operation. Next, we perform the refinement process in a similar iterative fashion and push the entries of newly created children in the linked list to replace their mother block. After new blocks at finer level are produced, linear spatial interpolation is used to project the solutions from the mother (coarse) block to its children, holding a finer mesh. The details of these numerical operations are given next. Finally, we again reset the values of the refinement flags to 0 for all active blocks in the newly updated Hilbert SFC.

Restriction, Flux Correction, and Prolongation
Every derefinement and refinement operation is associated with a restriction and prolongation procedure. Whenever sibling blocks are derefined to create a mother block with reduced (by one) LOR, volume-averaged conserved variables stored at the cell centers of the computational cells belonging to the child blocks are restricted to compute the volumeaveraged value of the variable for cells in the newly created mother block at lower LOR. The same procedure must also be followed to obtain the values in ghost cells of a coarse block that shares a boundary with a finer (one level higher LOR) block. Let where N D is the dimension of the AMR application.
A similar approach is used to correct the values of the faceaveraged interface fluxes in the boundaries of the coarser block for N D > 1. Although we can directly compute these fluxes across coarser cell boundaries connected with finer cells, the conservative property of the numerical scheme would be distorted because these values are unlikely to exactly match the corresponding interface flux that is computed on the cells belonging to the finer block (one level higher LOR) due to the difference in truncation errors during discretization. A flux-correction procedure must therefore be enforced by correcting the interface fluxes on these coarse cells using the values calculated on the finer cells. Given the face area and interface flux of the finer cells in the shared coarse/fine interface DA p f and F f p , then the correct value of the interface flux F c for the coarser mesh with a face area ΔA c is calculated as Similar to the scenario for restriction, we need to perform a prolongation process for the volume-averaged, cell-centered quantities in the coarser cells to calculate the corresponding values in cells belonging to the newly created refined blocks due to the refinement process or the ghost cells of a finer block sharing a boundary with a coarser block. Unlike the restriction process, the prolongation itself is not conservative, and as such, it may be useful to apply it on the primitive variables. We describe the prolongation process for a 3D setup because the equations for lower dimensions are analogous. , 4 . Then the prolongation procedure for the finer cells is defined asˆ( 1 minmod minmod minmod It should be noted that the choice for the slope limiter in the prolongation process is restricted to minmod alone because in the end, we try to construct a multidimensional profile using a linear combination of 1D gradients. Our numerical tests suggest that any other limiter that is sharper than minmod leads to significant oscillations near the interfaces. This is consistent with the recent work done by Stone et al. (2020), and as such, we refrain from finding alternative options at present. When the values of variables such as density and/or pressure are negative or zero during the prolongation, the simulation must be restarted with an increased minimum resolution, which can be achieved by increasing the number of computational cells per each block. Figure 4 shows a situation in which a block with a higher LOR having coarse cells is placed on top of two finer blocks. Following the nodes marked for prolongation and restriction, we can clearly see that the restriction process should be performed before prolongation because the required nodes for restriction remain inside the block's inner zone, while the nodes required for prolongation often require the coarse cells in the ghost region of the coarse block.

Base Numerical Schemes and Time Integration
The block-based semidiscrete form of Equation (1) for the 3D problem is given by Similarly,˜˜f f f , , , .
In these definitions, we use Δ b V i,j,k as the volume of the computational cell in the bth block with the cell center located at ( ) x y z , ,  are obtained from the conservative counterpart following the physics-driven EOS. We follow the second-order variation of the uonotone Upwind-central schemes for conservation laws (MUSCL) scheme used by Nessyahu & Tadmor (1990) and Kurganov & Tadmor (2000). In this approach, the left stateˆ+ q i jk L , , 1 2 and the right statê is the ratio of the differences along the x-direction, and α(r) is a limiter function to be calculated usingr q x . We have implemented a variety of second-order limiters including van Leer  2. Afterward, a physics-specific Riemann solver is applied to calculate the interface flux using the interpolated faceaveraged values of the left and right states. Basically, the Riemann solver defines a flux function Riemann  such that


In this work, we have used HLLC (as discussed in Appendix A) and HLLD (as discussed in Appendix B) Riemann solvers for hydrodynamic and MHD systems, respectively.

Time Integration
SADHANA currently uses a second-order accurate Runge-Kutta method without any subcycling for time integration. The procedure followed is presented here.

A projection approach is used at the beginning of
each time integration to project the solution available at the nth time step to the AMR framework for the ( ) + n 1 th time step such that we obtain 1, 2, , , 1, 2 ,.., n n 1 , with n b being the maximum number of leaf blocks per processing unit. This is attained through the following set of operations: (a) check the refinement criterion of all active blocks at the current process, (b) derefine and restrict selected sibling blocks following Equation (5), (c) refine and perform prolongation blocks following Equation (8) for every bth block as We should note that for the EGLM-MHD case, we do not include the source term for the ψ equation here, as that is taken care of later.  Once again, we exclude the source term for the ψ equation, as before. 8. Now that the integration is completed, we need to address some physics-specific operations. For the EGLM-MHD system, this is where we accommodate the y

Numerical Examples and Computational Performance
While the main novelty of this work includes the aforementioned modified definition of block boundaries and the listed approach to prepare face-continuous Hilbert-type SFC, we also show the flexibility and application of our adaptive mesh solver by means of numerical examples. Both hydrodynamics and MHD type of physics have been considered in these examples.

1D Sod Shock Tube
Following the conventional approach of code validation, we start with the classic shock tube test as mentioned in Sod (1978). The initial condition for the test is given by We set the initial discontinuity location at x s = 0.5 and the adiabatic index γ = 1.4. A variety of cells/block is considered, and we use CFL = 0.4 for all cases. We run the simulation for = t 0.2 max in the computation domain [0 x 1]. As can be seen from the results given in Figure 5, all three regions of the system, namely left rarefaction, contact discontinuity, and right shock discontinuity, are captured in our solution.

1D Shu-Osher Shock Tube
Next we consider the shock tube test described in Shu & Osher (1988). The initial profile is given by We set the initial discontinuity location at x s = −4.0, and the density perturbation is defined by the amplitude A ρ = 0.2 and the frequency f ρ = 5.0. The computational domain is set as [−5 x 5], and the zero-gradient boundary condition is used at both boundaries. With γ = 1.4 and CFL = 0.4, we run the simulation until = t 1.8 max . Figure 6 shows the final result.

2D Isentropic Vortex
The problem is described in Yee et al. (2000) and is a very good test for the accuracy of hydrodynamic solvers because an exact solution is available. The initial condition can be defined in primitive state as    We have used x 0 = 0, y 0 = 0 as the initial location of the vortex and β = 5. The spherical symmetry of the system is disturbed by v 0 because otherwise, our system would lead to a time-invariant state because the centrifugal force is exactly balanced by the pressure gradient toward the vortex center. The simulation is run using CFL = 0.4 with periodic boundaries in all direction. The computational domain is defined by [−5 x 5, −5 y 5] and an adiabatic index with the value γ = 7/5 is used. We run the simulation until = t 10 max . Second-order convergence is achieved in L 1 and L 2 norms of error for both density and pressure, as shown in Figure 7.  We take L = 1, , k = 2, and λ = 0.01. Other relevant computational parameters are CFL = 0.4 and the adiabatic index g = 5 3 . Figure 8 shows the results at t = 1.5 and can directly be compared with Figure

2D Double Mach Reflection
This problem is described in Woodward & Colella (1984) and is a classical two-dimensional test case for hydrodynamics solvers. The computational domain for this problem is [0 x 4, 0 y 1], and a plate lies at the bottom of the computational domain starting form x = 1/6. Initially, a rightmoving Mach = 10 shock is positioned across x = 1/6, y = 0 and makes a 60°angle with the x-axis. . As such, we use postshock values for 0 x < s(t) and preshock values at s(t) x 4. Figure 9 shows the result at t = 0.2. This test is a very good example to show the capability of SADHANA of using both square and rectangular blocks for hydrodynamic problems.

1D Brio-Wu Shock Tube
An extension of the Sods shock-tube problem with magnetic fields was given in Brio & Wu (1988). This test can be used to verify the ability of an MHD solver to capture the characteristic waves, i.e., the shocks, rarefactions, and contact discontinuity waves. The initial state is given by , , , , , , 1,0, 0,0,1, 0.75, 1.0,0 if , 0.125,0, 0,0,1, 0.75, 1.0,0 if . 20 We set x s = 0.5 and the adiabatic index γ = 2. We simulate the problem until = t 0.12 max with CFL = 0.4 in a computational domain of [0 x 1]. At both boundaries, we keep the boundary values the same as the initial profile. The final results are shown in Figure 10.

1D Shu-Osher-Susanto Shock Tube
We consider a modified version of the Shu-Osher shock tube case, as developed by Susanto for an MHD problem (Susanto 2014). The initial conditions for this test are , , , , , , 3.5, 5.8846, 1.1198,0, 42 Similar to the hydrodynamic case, we solve the problem in the [−5 x 5] domain with a zero-gradient boundary condition and set x s = − 4. The adiabatic index is taken as γ = 5/3, and we run the simulation until = t 0.7 max with CFL = 0.4. This problem has no analytic solution, so we use a highly refined grid (with N = 131072) to compute the reference solution. The final result is given in Figure 11.

2D Isodensity MHD Vortex
This problem was described in Balsara (2004), and, just like Section 4.1.3, is an excellent candidate for a convergence study of an MHD solver because an exact solution is available. The initial condition of the primitive state vector Q is defined by adding a perturbation δQ to the unperturbed backgroundQ such that (2) We have , with x 0 = 0, y 0 = 0 being the vortex location during initialization. Since the vortex is basically being advected under equilibrium conditions, similar to the isentropic vortex problem discussed before, we can obtain the exact solution at any time t as ( ) , 0 0 . The problem is solved in the domain defined as [−5 x 5, −5 y 5] with periodic boundary boundaries in all directions. Other noteworthy conditions for the problem include a CFL value of 0.4 and an adiabatic index of γ = 5/3. The simulation is run until = t 10 max when the vortex returns exactly to the location where it was at t = 0.
We have plotted the L 1 and L 2 norms of error for the pressure and x-component of the magnetic field as obtained from the solution at t = 10 in Figure 12. As we can see, the results follow a second-order convergence rate, which is expected given our numerical method.

2D Orszag-Tang Vortex
This problem was first studied in Orszag & Tang (1979) for incompressible MHD equations. Many authors have used the Orszag-Tang vortex for compressible MHD equations as a test problem in order to know how robust the employed numerical scheme is at handling the formation and the interactions of MHD shocks. We set the problem up in a computational domain defined as [0 x 1, 0 y 1]. With an adiabatic index γ = 5/3, the initial condition for the problem is defined as , , , , , , , , sin 2 , sin 2 , 0, , sin 2 , sin 4 , 0 .
x y z t x y z T T 2 We consider periodic boundaries in all directions and run the simulation until = t 1.0 max with CFL = 0.4. Figure 13 shows the results at different times.

2D Cloud-Shock Interaction
This problem describes a a strong shockwave interacting with a dense cloud. We use the well-discussed setup described in Dai & Woodward (1994) and used in Toth (2000). We have the ambient condition Q a with a strong shock at x s = 1.2 covering the computational domain of [0 x 2, 0 y 1] and a high-density circular cloud with radius r c = 0.15 located at x c = 1.6, y c = 0.5. The cloud properties are defined by Q c such that the cloud is in hydrostatic equilibrium with the ambient medium,  Figure 14 shows how the solution evolves in terms of density and pressure contour. Just like Section 4.1.5, this test case presents the Figure 14. 2D MHD cloud-shock interaction problem: Results at t = 0.0, 0.05, and 0.10 in terms of density and thermal pressure contour. For I.(a)-(c), each AMR block has 4 × 4 nodes, while for II.(a)-(c), each AMR block has 4 × 8 nodes. Since we use the maximum refinement level = 7 for both simulations, the peak resolution remains the same, i.e., 256 × 512. capability of SADHANA of using both square and rectangular blocks for MHD problems.

Computational Performance and Scaling
To assess the computational efficiency of SADHANA, we have run several strong and weak scaling tests. The isentropic (Section 4.1.3) and isodensity (Section 4.2.3) vortex problems are extended to 3D with [−5 z 5] and are chosen as model problems for an HD and MHD scaling test, respectively. Each block has 16 3 computational cells for strong scaling tests, and we run the simulation for 1000 time steps. We vary the number of mpi processes as [N p = 4, 8, 16, 32, 64, 128, 256] and record the time taken for each case. Afterward, for each case, the computational time is normalized by the time taken for N p = 4 to reach the final speedup result. The weak scaling test for our AMR framework appeared to be tricky because we needed to maintain the same amount of computational load for each MPI process. As such, we ended up having a higher resolution for our simulation as we kept increasing the number of MPI processes (N p ) and the refinement criteria needed to be modified to make sure that the mesh was refined in the same regions for all cases. Each MPI process in our weak scaling tests has eight blocks, and each block has 16 3 computational cells, just like the case for strong scaling tests. The HPC cluster uses Intel Xeon Platinum 8280L CPUs and is equipped with the Simple Linux Utility for Resource Management (SLURM) as its workload manager. Figure 15 shows the final result of our scaling tests, which is more than adequate for our use at present.

Summary and Conclusion
We have described the relevant algorithms used in a newly developed AMR framework called SADHANA. A new definition of finite-volume boundaries is given, which can assist the development of block-structured AMR codes. A novel list-based method is also presented here to generate multidimensional Hilbert SFCs, which can be applied beyond AMR. A second-order accurate reconstruction scheme has been used throughout the paper, and approximate Riemann solvers for HD and MHD systems are implemented. While in this work, we have limited ourselves on implementation of ideal HD-MHD physics to test and demonstrate capabilities of SADHANA, more advanced physics can be added with little effort. Throughout our numerical tests, our solver has demonstrated to be a robust and versatile numerical framework, and it can be used to investigate realistic astrophysical problems in the future. SADHANA is capable of using arbitrary even numbers of grid points in each direction, which makes it a very good candidate for the implementation of highorder schemes in the future. It should be explicitly noted that at present, we have only used a Cartesian coordinate system in our code. Extensions to other coordinate systems are left for future development. Also, we restrained ourselves from using any subcycling approach in our implementation of time integration. The solver currently uses cell-centered data for each computational cell. While we intend to modify SAD-HANA to handle staggered data in the future, it is not on our priority list at present. For postprocessing purposes, SAD-HANA saves all block definitions and the associated variables within the blocks in a file, and Python-based scripts are used to plot/analyze these data. We are interested in preparing a better option to view/process the output data in the future. HLLC Riemann Solver The Harten-Lax-vanLeer-Contact (HLLC) solver, as presented by , is used for adiabatic hydrodynamic problems in SADHANA. It is a modification of the very frequently used Harten-Lax-vanLeer (HLL) solver pioneered by Harten et al. (1983). Further developments in this direction were made by Toro & Chakraborty (1994) and Batten et al. (1997). For simplification purposes, we use q L and q R instead ofˆ+ q i jk for left and right states. The conservative states u L/R are calculated from their primitive counterparts q L/R following the specific EOS (adiabatic hydrodynamics in this case). The Riemann fan is divided into one intermediate state u * obtained between three waves with slopes S L , S R , and S M . The values of these slopes are estimated from q L and q R as Afterward, we estimate the intermediate states as determined for u L * and u R * as And finally, we have the HLLC flux function,

Appendix B HLLD Riemann Solver
SADHANA uses the Harten-Lax-van Leer Discontinuities (HLLD) Riemann solver, as proposed in Miyoshi & Kusano (2005), for adiabatic MHD problems. Some modifications are made to the classic HLLD algoriothm to accommodate the EGLM-MHD system (or rather the GLM-MHD system because the structure of the HLLD solver does not change, regardless of whether we consider GLM or EGLM) without changing the basic HLLD scheme. As in the case of the HLLC solver, we use q L and q R for the left and right states instead ofˆ+ q i jk L , , 1 2 . At first, we calculate the conservative states u L/R from their primitive counterparts q L/R . Afterward, we calculate the variables that are responsible for the construction of our GLM system, In the HLLD approach, the Riemann fan is divided into four intermediary states u L * , u L * * , u R * * , and u R * obtained between the five waves, namely, two fast magnetosonic waves [l = -  The corresponding slopes S L , S R , S M , S L * , and S R * in the Riemann fan are estimated from the components of both q L and q R , The total pressure inside the Riemann fan is considered to have a constant value * p T , We should note that according to this formulation, all other components of u K * need to be estimated before we calculate the energy term. Also we need to avoid division by a zero situation for ( )( ) r ---= S v S S B 0 . In these cases, we assume u K * = u K and To estimate the conservative variables u L * * and u R * * related to the other two intermediate states, we again apply the Rankine-Hugoniot condition for slopes S L * and S R * with the assumption that ρ K * * = ρ K * , which again comes from application of the continuity equation over the jumps with the same x-direction velocity, leading to the following formulation: