PARALLELIZATION METHODS FOR SOLVING THREE-TEMPERATURE RADIATION-HYDRODYNAMIC PROBLEMS

. An eﬃcient parallelization method for numerically solving Lagrangian radiation hydrodynamic problems with three-temperature modeling on structural quadrilateral grids is presented. The three-temperature heat conduction equations are discretized by implicit scheme, and their computational cost are very expensive. Thus a parallel iterative method for three-temperature system of equations is constructed, which is based on domain decomposition for physical space, and combined with ﬁxed point (Picard) nonlinear iteration to solve sub-domain problems. It can avoid global communication and can be nat- urally implemented on massive parallel computers. The space discretization of heat conduction equations uses the well-known local support operator method (LSOM). Numerical experiments show that the parallel iterative method pre-serves the same accuracy as the fully implicit scheme, and has high parallel eﬃciency and good stability, so it provides an eﬀective solution procedure for numerical simulation of the radiation hydrodynamic problems on parallel com- puters.


1.
Introduction. The radiation-hydrodynamic flow problem is a classic multiphysical problem coupling with radiation and hydrodynamics, which occurs in many applications such as inertial confinement fusion (ICF), strong explosion and astrophysical systems. When the electron, ion and radiation field are in local thermodynamic equilibrium respectively, each having its own temperature, the so-called three-temperature radiation-hydrodynamic model is formed. In some physical processes, radiation conduction and its interaction with matter (electron) via energy exchange between them have a substantial effect on both the state and the motion of materials. Note that the particle picture of the radiation is photon, we will not distinguish the terminologies "radiation" and "photon" in the following.
Radiative processes happen on time scales that differ by many orders of magnitude from those characterizing hydrodynamic flow. This leads to significant computational challenges in the efficient numerical simulation of radiation hydrodynamic problems. The usual solution procedure for radiation hydrodynamic problems is the splitting method, which is based on the splitting of two distinct physical processes (radiation and hydrodynamic flow). First the Lagrangian hydrodynamic equations are solved by an explicit time discrete scheme and staggered or cell-centered scheme for space discretization. We refer to [16] and [11] for staggered scheme and [7] for cell-centered scheme. Here the Lagrangian grid is moving with fluid, and distorted cells occur with flow becoming distorted. Then on the Lagrangian grid the three temperature equations for photon, electron and ion conduction are solved by fully implicit time discrete scheme and finite volume schemes, e.g., LSOM [8] and the diamond scheme [21], for space discretization.
The fully implicit scheme is used in solving three-temperature heat conduction equations, including radiation conduction or diffusion, due to two considerations. The first is based on physical consideration, that is, in the physical processes we concerned, the time scales of a photon flight time over a characteristic structural length l or over a photon mean free path λ p , i.e., l/c or λ p /c, are very small, where c is the photon velocity in vacuum. Moreover the processes of energy exchange among photon, electron and ion are fast and even faster than the heat conduction in some cases, so that the energy exchange processes give rise to strongly stiff terms in the system of radiation hydrodynamic equations. The second is from numerical stability consideration, i.e., the implicit scheme is stable unconditionally, while the explicit is not. The well-known CFL condition for the explicit scheme of hydrodynamic flow is c s ∆t ∆x ≤ 1, where c s is a strictly positive number related with the sound speed. A similar condition for the explicit scheme of heat conduction is κ ∆t ∆x 2 ≤ 1 2 , where κ is conduction coefficient, so that the time-step will be very small for small ∆x and 1/κ. Note that for multi-material problems the mass density for different materials may differ in several orders of magnitude, and κ may depend on certain powers of density. Therefore the explicit scheme for heat conduction cannot be used in practical simulations.
From the implicit discretization of three-temperature heat conduction equations at each time-step, a (global) nonlinear algebraic system is formed. The computational cost of solving the system is very expensive, and we have to use massive parallel computer in order to get numerical solutions within an acceptable time. However the implicit scheme cannot be implemented directly on parallel computer since there is strong data coupling, moreover a large amount of global communication is required in solving iteratively the nonlinear algebraic system. There are two kinds of approaches for the existing parallelization methods. One is based on algebraic parallelization without any modification to the implicit scheme, but it cannot avoid global communication, see [13]. Therefore we consider another approach of the parallelization, i.e., the implicit scheme is modified at the interface among neighboring sub-domains by domain decomposition, and the resulting scheme is called as parallel scheme. It is necessary to construct a parallel scheme of avoiding global communication for solving Lagrangian radiation hydrodynamic problems on massive parallel computer.
Each of the three-temperature equations is a parabolic equation. Almost all of the work on parallel schemes are devoted to parabolic problems only on one-dimensional or two-dimensional rectangular grids (see [18], [19], [20]). Usually a parallel scheme is a non-overlapping and non-iterative domain decomposition method with a suitable treatment on inner boundary conditions for sub-domain problems. In [1] a finite difference domain decomposition algorithm is constructed, in which an explicit scheme on coarse mesh is used at interface, then it can take larger time step than that restricted by the classical explicit scheme. In [2] a unconditional stable domain decomposition method is presented with lower accuracy O( √ ∆t + h), where ∆t and h are the time and space steps respectively. This method is improved in [5], but the space accuracy keeps first-order. In [3] an anisotropic heat conduction problem in magnetized plasmas in cylindrical coordinate is solved by using a parallel scheme constructed in [22], where the values at the interface of sub-domains are predicted by the Du-Fort Frankel scheme. In [25] a stabilized explicit-implicit domain decomposition method is proposed, which is also unconditional stable. In [6], [12], [14] and [24] some unconditionally stable parallel difference schemes with second order accuracy are presented, but they are mainly adapted to the rectangle grids. A conservative domain decomposition method for nonlinear diffusion problems on quadrilateral grids is proposed in [21], in which the diamond scheme (one of nine-point scheme on quadrilateral grids) with Dirichlet and Neumann conditions at inner boundary is adopted for solving sub-domain problems. In [9] a parallel Newton-Krylov-Schwarz (NKS)-based non-linearly implicit algorithm is presented for the numerical solution of the unsteady non-linear multimaterial radiation diffusion problem in two-dimensional space, moreover the numerical results show that the parallel NKS is CPU-time-scalable and non-linearly scalable, but not linearly scalable in terms of iteration numbers. A comprehensive monograph about domain decomposition method is [15]. Recently a Schwarz iterative algorithm with Robin inner boundary conditions is proposed in [4] for solving the neutron diffusion equation, which is linear and discretized with Raviart-Thomas-Nédélec finite elements. However, in all of the above researches the computational grids are fixed (Eulerian) mesh. As we know, there is no study devoted to the parallel scheme for solving Lagrangian radiation hydrodynamic problems.
Our focus is to construct a parallel iterative method for solving the three temperature equations accurately and efficiently. The contributions of this paper include: (i) The validity and applicability of nonlinear diffusion solver are verified on Lagrangian grid, which might become more and more distorted as flow deteriorates; moreover in this solver there is a distinguished feature: the implementation of LSOM is nested in the Picard nonlinear iteration and combined with the domain decomposition method; (ii) The validity and applicability of parallel iterative method are verified for solving Lagrangian radiation hydrodynamic problems, in particular, a complete simulation of whole physical process of multi-material Lagrangian radiation hydrodynamic problems in cylindrical coordinate is performed efficiently; (iii) Our parallelization of LSOM guarantees that it becomes an applicable method in solving radiation-hydrodynamic problems, since the cost of its sequential computation is too huge to be used in practice; we note that usually a cell-centered diffusion scheme (e.g., diamond scheme) has only cell-centered unknown as primary unknown, but LSOM has both cell-centered and cell-edge unknown as primary unknown, so the cost of LSOM increases at least 3 times compared with cell-centered scheme.
This paper is organized as follows. In section 2 the discretization of threetemperature radiation-hydrodynamic equations are introduced. In section 3 the LSOM method for nonlinear diffusion equation is presented, and in section 4 the Picard iteration method for solving the resulting nonlinear system is described. In section 5 a parallel iteration method for solving the three-temperature radiationhydrodynamic equations is constructed. The numerical experiments are given in section 6 to demonstrate the accuracy, stability and parallel efficiency of the parallelization method. The conclusion is summarized in section 7.

Discretization of three-temperature radiation-hydrodynamic equations.
We consider the following system of radiation-hydrodynamic equations with threetemperature in two-dimensional cylindrical coordinate (x, r): where d dt denotes the material or Lagrangian time derivative, and α = e, i, p represent electron, ion and photon respectively. Here ρ = ρ(x, r, t) is the mass density, , v(x, r, t)) the flow velocity, P = P e + P i + P p + q the total pressure, q viscosity, and χ α = 1 for α = i, and χ α = 0 for α = e, p. C vα , ε α , P α , T α are the specific heat, internal energy, pressure and temperature of electron, ion and photon respectively. F α = −κ α ∇T α and κ α are heat flux and conductivity coefficients of electron, ion and photon respectively. S α are energy exchange terms plus source terms given as follows: where w ei (or w ep ) is the exchange coefficient of electron-ion (or electron-photon) energy, and W e or W i is a given source. In general w ei , w ep , C vα and κ α are nonlinear functions of ρ and T α .
The set of equations (1)-(3) describes two kinds of physical processes coupled tightly: one is nonlinear interaction for hydrodynamics, and the other is heat conduction for electron, ion and photon due to temperature gradient. This system is closed if the equations of state P α = P α (ρ, T α ), ε α = ε α (ρ, T α ) are given. The solution procedure is formulated by a physics-based splitting approach. On each time level, the first step is that the hyperbolic part of hydrodynamics is solved by explicit scheme, which can implemented on parallel computers easily. Then the second step is that the parabolic part of three-temperature is solved by an implicit scheme, where the work terms P α + qχ α + ∂εα ∂V dV dt are given from the 1st step. Moreover, the computational mesh in the 2nd step is determined by the 1st step.
The equation (1) is the mass conservation equation, which leads to an algebraic equation, i.e., the mass on (n + 1)-th time level is equal to that on n-th time level The equation (2) is the momentum conservation equation, and it is solved by the explicit time discretization and the staggered space discretization, i.e., the thermodynamic unknowns (density, pressure) are defined on the center of cell and the flow velocity is defined on the node of cell. Denote X 0 = (x 0 , r 0 ) be a node of cell in X = (x, r) plane, and the node velocity U 0 = (u 0 , v 0 ). Then the node is calculated in the following by the Wilkins scheme: where n , and (ρ)k and (A)k are the density and area of cellk respectively. (4) comes from the discretization of (2) on a dual cell (1234) with dashed lines as cell-edge in Fig.  1, which depicts the stencil of the Wilkins scheme.  The three-temperature energy equations (3) are parabolic equations, which are strongly nonlinear and stiff, moreover the coefficients are discontinuous for multimaterial problems. When the implicit time discretization and LSOM space approximation are applied in solving (3), a large nonlinear discrete system is formed whose solution is very time-consuming, moreover it is difficult to implemented naturally on parallel computer. So a parallelization approach will be proposed in section 5, and a new discrete system is formed, which consists of smaller and decoupling discrete systems and can be solved in parallel on parallel computer. 3. LSOM for solving diffusion equations. Consider the following model equation for nonlinear diffusion where F = −κ(X, t, u)∇u is the heat flux, u = u(X, t) is a function to be solved, Ω is a polygonal domain in R 2 , X = (x, r) ∈ Ω, C(u) and κ are positive functions which depend on u and could be discontinuous on Ω, and Q is a given nonlinear function.
For solving the equation (5) we will use the local support-operators method (LSOM) in [8], which satisfies the following properties: (1) it gives almost secondorder accuracy on distorted grids even with material discontinuities; (2) it has a local stencil; (3) its coefficient matrix is symmetric positive-definite.
The domain Ω is divided into a structural quadrilateral mesh with cell index (i, j). The finite volume discretization of 5 can be written as where the unknown variable defined on the cell center is denoted by u i,j , and the unknown variable defined at each cell-edge is denoted by u i,j,k , 1 ≤ k ≤ 4, (see Fig.  2). The outward normal flux crossing each edge is denoted by F i,j,k , 1 ≤ k ≤ 4, and the subscript k represents the k-th edge of the cell (i, j). The area of the cell (i, j) is denoted by S i,j . The normal flux F i,j,k for LSOM is represented in terms of the unknown variables u i,j and u i,j,k as follows: where A i,j is a 4 × 4 symmetric matrix depending on the geometrical quantities of cell, i.e., where a k,k+1 = w k cos ϕ k,k+1 , w k is weighted coefficient, l k is length of the k-th edge of the cell (i, j), and ϕ k,k+1 is the k-th internal angle of the cell (i, j), (see Fig. 2). The subscript k loops from 1 to 4, i.e., for k = 4, the k + 1 is 1. To complete the system (6) two continuity conditions are imposed at each edge, for example, for the first edge of the cell (i, j) there is LSOM is obtained in [8] by setting sin ϕ k,k+1 , (k = 1, 2, 3, 4).
A generalization of LSOM is discussed in [17] by taking different weight coefficient w k and is named as the CFM (Cell Functional Minimization).
In the LSOM formulation, the primary unknowns include both the cell-center and the edge-centered unknowns, and the normal flux crossing edge is given by (7). The fact that the edge-centered unknowns are primary ones is helpful in the construction of parallel iterative methods, since the values on the inner boundaries are needed in solving sub-domain problems. 4. Nonlinear iteration method. We use the Picard fixed-point iteration method to solve the nonlinear discrete system (6), (7) and (9).
In the following, the superscript n+1 identifying the current time level is omitted for simplicity, so that the discrete function u n+1 i,j is denoted by u i,j simply. The superscript n of u n at previous time level is preserved for avoiding confusion. Let s be the nonlinear iteration index. Then Picard nonlinear iteration procedure is devised as follows: where  and the value in all the nonlinear coefficients C(u), κ(X, t, u) and Q(X, t, u) is taken to be the value of the previous iteration step so as to form a linearized system.

5.
Parallel iterative method. In this section, we present a parallel iterative method for solving nonlinear diffusion equations, which is the main part of our parallelization method for solving three-temperature radiation-hydrodynamic equations. For the sake of clarity, the global domain Ω is decomposed into two overlapped subdomains Ω 1 and Ω 2 , see Fig. 3. Denote Γ 12 = ∂Ω 1 Ω 2 and Γ 12 = Ω 1 ∂Ω 2 , where Γ 12 and Γ 12 are the inner boundaries for the sub-domains Ω 1 and Ω 2 respectively. The parallel iterative method is based on domain decomposition and devised as follows. The diffusion problem to be solved on the sub-domain Ω 1 is together with the prescribed boundary condition on ∂Ω 1 ∂Ω, and initial condition on t = 0. Here F (s+1) k is the discrete version of − l k κ(u (s) ) ∂u (s+1) ∂n dl on the k-th edge of a cell in Ω 1 . On the sub-domain Ω 2 , the following diffusion problem is solved together with the prescribed boundary condition on ∂Ω 2 ∂Ω, and initial condition on t = 0. Here F (s+1) k is the discrete version of − l k κ( u (s) ) ∂ u (s+1) ∂n dl on the k-th edge of a cell in Ω 2 . α is a given nonnegative constant, and we take α = 0.1 or α = 10 in the following test models 1 and 2 respectively.
For each s both of the problems (12)- (13) and (14)- (15) are linearized and decoupled, which can be solved on parallel computer in parallel. Note that the nonlinear Picard iteration is nested within domain decomposition, and the iteration among sub-domains and the nonlinear Picard iteration are performed synchronously.
In our practical implementation the global domain Ω is divided by the structural quadrilateral grids, and partitioned into some sub-domains with shape shown in Fig. 4. The gird cells are averagely distributed in sub-domains, and each subdomain is assigned to one CPU. Moreover, for the convenience of the parallel computation, each sub-domain will be extended with a row and a column of cells along its inner boundary (see the sub-domains with the dashed borders in Fig. 4). The overlapped domain decomposition strategy is adopted so that it is convenience for using the original code in solving momentum equations, and for implementing various boundary conditions at inner boundary, including Dirichlet, Neumann and Robin conditions in solving three temperature equations. Now the algorithm procedure for parallel iterative method is described. To state the main idea clearly, here we give just the case of two sub-domains as shown in Fig. 5 with Robin boundary condition at inner boundary. The sub-domain 1 with blue broken line border corresponds one processor numbered 1, and the sub-domain 2 with red solid line border corresponds another processor numbered 2.
The detailed procedure for our parallelization method on t n+1 time level is given as follows. and Calculate the values α u (s+1) + κ( u (s+1) ) ∂ u (s+1) ∂n at Γ 12 on the Processor 2, and then pass them each other to refresh their inner boundary conditions. The subscript i → j in Fig. 5 represents the inner boundary conditions are computed on Processor i and then pass them to Processor j (i, j = 1, 2). 4. Repeat the steps 2 and 3 till the convergence criterion holds. It is noted that the above procedure is also fit for the case that inner boundary condition is Dirichlet or Neumann boundary condition. Furthermore, for our parallel iterative method only local communication is needed, i.e., message passing happens only among neighboring processors, which is beneficial to get high parallel efficiency. 6. Numerical experiments. In this section we present some numerical examples to show the performance of the parallelization method for solving two dimensional, three temperature, Lagrangian radiation hydrodynamic problems.
In our test problems we use the Thomas-Fermi statistic equation of state for electron, and the idea gas equation of state for ion such as P i = Γ i ρT i , ε i = 3 2 Γ i T i , where Γ i is a physical constant related with medium, and for photon P r = a 3 T 4 r , ε r = a ρ T 4 r , where a is the radiation constant. For the choice of time-step we take ∆t = 10 −5 at the initial time, and then its size varies adaptively according to the number of nonlinear iteration in solving three temperature equations and CFL conditions. When the number of nonlinear iteration is larger than 20 or CFL condition does not hold, the time-step decreases. When the number of nonlinear iteration is smaller than 10, the time-step increases and the increased time-step must satisfy the following constraints: (1) CFL condition holds; (2) it is not larger than a given bound, e.g., 5 × 10 −3 ; (3) the frequency of increasing time-step is not larger than a given number, e.g., 0.1, that is the number of increasing time-step is not larger than 40 when 400 time levels are considered to be one cycle. We refer to [10] and [23] for more detail discussion.
The convergence criterion of nonlinear parallel iteration for solving three temperature equations is that the maximum norm of the nonlinear residues of three temperature equations are smaller than 5 × 10 −7 .
6.1. Test Model 1. The initial computation domain is a quarter of circular disc {(x, r)| √ x 2 + r 2 ≤ 150, x ≥ 0, r ≥ 0} with its initial grids as shown in Fig. 6. where the T b = 0.01 × (30 − (t/100 − √ 29) 2 ), and its variation curve is shown in Fig. 7. The initial position of the free surface is the quarter of circular arc ρ = 0.014866, for x 2 + r 2 ≤ 120, ρ = 0.076817, for 120 < x 2 + r 2 < 150. In order to verify the accuracy of the parallelization method, we compare the result of using 16 CPUs with that computed by 1 CPU. The initial grid is 20 × 30, which is radial as shown in Fig. 6. (m, n) denote a cell located at m-th column (circumferential direction) and n-th row (axial direction), and 1 ≤ m ≤ 20, 1 ≤ n ≤ 30.
Figs. 8, 9 and 10 are respectively the temperature variation curves of the electron, ion and photon at cells (10, 1), (10,6), (10,7), (10,30). The solid lines are  11 gives the time-steps and the nonlinear iteration numbers used by the original sequential method on 1 CPU and by the parallelization method on 16 CPUs, respectively. It shows that the parallelization method needn't decrease the size of time-step, moreover the nonlinear iteration numbers remain same. Fig. 12 presents the linear iteration numbers used by the original sequential method on 1 CPU and by the parallelization method on multi-CPUs, which are in fact the maximum of all linear iteration numbers used on various CPUs from 2 to 16 in solving those linear systems resulted from the parallelization method. It shows that the linear iteration numbers decrease with increasing the number of CPUs and the scale of linear algebraic system becoming small.  Table 1 gives the time for the model 1 by using CPUs from 2 to 16, and the computational time is 6922s for 1 CPU. The reason for super-linear speedup includes that more linear iterations are needed to solve a global linear system on 1 CPU, which leads to a larger scale linear system than those linear systems for sub-domain problems on multi-CPUs, while the time step size and the nonlinear iteration numbers are kept to be same, see Fig. 11. The parallel efficiency is estimated by comparing with the time for 2 CPUs, which shows the parallel accelerated effect is remarkable.  Table 2 gives the numerical results for model 1 with 400 × 800 cells. The largest number of the CPUs is 3200, and the parallel efficiency is approximated 89%. This result shows that the parallelization method is very efficient and has good scalability for large scale model.
where ran is a disturbance factor, which is a random number from −1 to 1. The initial density is shown as Fig. 13 (right). (m, n) denotes a cell located at m-th column (x direction) and n-th row (r direction), and 1 ≤ m ≤ 20, 1 ≤ n ≤ 20.
In the most left column of grids, the source S(t) = 0.001 × (10 − (t/100 − 3) 2 )t is added, and its time variation curve is shown as Fig. 14. Figure 13. The intial grids 20 × 20 and the initial density We contrast the result of using 16 CPUs with that computed by 1 CPU. Fig. 15 shows at time 600 (60ns) the grid by 1 CPU (up) is almost the same as that by 16 CPU (down). Figs. 16-18 are respectively the temperature variation curves of the electron, ion and photon at different cells. We can see that they are completely coincident in whole computation phase. Fig. 19 gives the time-steps and the nonlinear iteration numbers obtained by the original sequential method on 1 CPU and by the parallelization method on 16 CPUs, respectively. It shows that for the parallelization method the size of timestep remains same, and the nonlinear iteration numbers increase not so much. In our future work we will consider how to improve the prediction of inner interface boundary conditions in order to reduce the nonlinear iteration numbers. Table 3 gives the time for the model 2 by using CPUs from 2 to 16, and the computational time is 3912s for 1 CPU. The parallel efficiency is estimated by comparing with the time for 2 CPUs. Note that the increase of nonlinear iteration numbers can reduce the parallel efficiency, moreover when grid keeps to be 20 × 20 and 16 CPUs are used, the cell number is not larger than 36 in each CPU, so the computational granularity in each CPU is very small.   Figure 19. The time-step and the nonlinear iteration numbers iteration. The computational mesh is moving with fluid, and then the interface among sub-domains is also moving. The three-temperature system of nonlinear diffusion equations is solved by the LSOM method on arbitrary quadrilateral grids. The sub-domain problems are linearized and can be solved on massive parallel computers in parallel. Numerical results show the parallel iterative method is accurate, which gives the same numerical results as the fully implicit scheme, moreover it is CPU-time scalable and linear parallel speedup is obtained in our numerical tests.