A high-order hybridizable discontinuous Galerkin method with fast convergence to steady-state solutions of the gas kinetic equation

The mass flow rate of Poiseuille flow of rarefied gas through long ducts of two-dimensional cross-sections with arbitrary shape are critical in the pore-network modeling of gas transport in porous media. In this paper, for the first time, the high-order hybridizable discontinuous Galerkin (HDG) method is used to find the steady-state solution of the linearized Bhatnagar-Gross-Krook equation on two-dimensional triangular meshes. The velocity distribution function and its traces are approximated in the piecewise polynomial space (of degree up to 4) on the triangular meshes and the mesh skeletons, respectively. By employing a numerical flux that is derived from the first-order upwind scheme and imposing its continuity on the mesh skeletons, global systems for unknown traces are obtained with a few coupled degrees of freedom. To achieve fast convergence to the steady-state solution, a diffusion-type equation for flow velocity that is asymptotic-preserving into the fluid dynamic limit is solved by the HDG simultaneously, on the same meshes. The proposed HDG-synthetic iterative scheme is proved to be accurate and efficient. Specifically, for flows in the near-continuum regime, numerical simulations have shown that, to achieve the same level of accuracy, our scheme could be faster than the conventional iterative scheme by two orders of magnitude, while it is faster than the synthetic iterative scheme based on the finite difference discretization in the spatial space by one order of magnitude. The HDG-synthetic iterative scheme is ready to be extended to simulate rarefied gas mixtures and the Boltzmann collision operator.

less than 0.001. Beyond this regime, gas flows deviate from equilibrium and the Boltzmann equation from the gas kinetic

The gas kinetic equation
The Boltzmann equation describes the evolution of the VDF depending on spatial position , and time t ′ . In Cartesian coordinates it has the form of: Here, f ′ is the VDF that is defined so that the number density of gas molecules at time t ′ , with velocity within the limits v ′ and v ′ + dv ′ , and spatial location within x ′ and x ′ + dx ′ , is equal to f ′ dv ′ dx ′ . a ′ = (a ′ 1 , a ′ 2 , a ′ 3 ) is the external acceleration, while C( f ′ ) is the collision operator, which describes the change in VDF after binary collisions [1].
Due to complexity of the collision operator, the full Boltzmann equation is amenable to analytical solutions only for few special cases. In practice, deterministic solution is commonly sought for gas kinetic models that reduce C( f ′ ) to simpler collision operators; frequently used are the BGK [32], ellipsoidal statistical BGK [33], and Shakhov [34]m o d e l s . Here we develop the numerical scheme based on the following BGK equation, which is written in the non-dimensional form as: where v is v ′ normalized by the most probable speed v m = √ 2RT 0 at the reference temperature T 0 with R being the specific gas constant, x is x ′ normalized by the characteristic flow length H , a is a ′ normalized by v 2 m /H, t is t ′ normalized by H/v m , n is the number density of gas molecules normalized by the average number density n 0 at T 0 , T is the gas temperature normalized by T 0 , and f is f ′ normalized by n 0 /v 3 m . The coefficient ω is the viscosity index, i.e. the shear viscosity μ of the gas is proportional to T ω . The normalized equilibrium VDF is defined as: where u = (u 1 , u 2 , u 3 ) is the macroscopic flow velocity normalized by v m . Finally, the equivalent rarefaction parameter δ is defined as the inversed Knudsen number: with p 0 and μ 0 being the pressure and shear viscosity of the gas at reference temperature T 0 , respectively.
When the flow velocity is sufficiently small compared to v m , and the external acceleration is also small, we can linearize the VDF about the global equilibrium state f eq as: and the VDF h(x, v) for the perturbation is governed by the following linearized BGK equation [35]: in which we have omitted the derivative with respect to the time since we are only interested in the steady-state solution. The macroscopic gas variables, including the perturbed number density ̺, the flow velocity u, and the perturbed temperature τ , are calculated from the velocity moments of the perturbed VDF: ̺ = hf eq dv, u = vhf eq dv, τ = 2 3 |v| 2 hf eq dv − ̺. 3 ) are chosen to represent the VDF [5]. If we denote h j = h(x, v j ), L j ̺, u, τ = L ̺, u, τ , v j , and f j eq = f eq v j , the linearized BGK equation is replaced by a system of differential equations for h j that are discrete in the velocity space but still continuous in the spatial space: which are usually solved by the following conventional iterative scheme (CIS): where the superscripts (t) and (t + 1) represent two consecutive iteration steps. The iteration is terminated when the convergence to the steady solution is achieved. For conciseness, we will omit the index of iteration step in the remainder of the paper unless necessary.
Note that L j contains macroscopic variables, which can be evaluated using some numerical quadratures: where ̟ j is the weight of the quadrature rule. Various quadratures are used, including the Gauss-Hermite quadrature [36] and the composite Newton-Cote rule with uniform and non-uniform velocity discretizations [6,9].

The synthetic iterative scheme for fast convergence
It is well known that the iterative scheme (9)i s very efficient in the free-molecular flow regime where binary collisions are negligible. However, for near-continuum flows the iteration scheme converges slowly and the results are very likely to be biased by accumulated rounding errors. The SIS, which has the asymptotic-preserving property in the NS limit and enables rapid convergence to the steady-state, has been developed for the linearized kinetic equations [23,24,12]t o achieve high efficiency and accuracy.
In this paper, we consider the steady gas flow along a channel of arbitrary cross-section in the x 1 -x 2 plane, subject to a small pressure gradient in the x 3 direction. It is assumed that the channel length is significantly larger than the other dimensions of its cross-section plane as well as the mean free path of gas molecules, thus the end effects can be neglected and the flow field only varies in x 1 and x 2 directions. Suppose the pressure gradient X P , which is normalized by p 0 /H, is very small, then the term 2a · v j in the linearized BGK equation (9)c a n be replaced by −X P v j 3 , and the synthetic diffusion equation for the flow velocity u 3 in the x 3 direction is given by [12]: where F m,n,l (x 1 , ̟ j are high-order moments, with H n (v) being the n-th order physicists' Hermite polynomial. It should be noted that Eq. (11)i s derived from the linearized BGK equation with no approximation, which only works for Poiseuille flow when the flow velocity is perpendicular to the computational domain. In the near-continuum flow regime where δ is large, the diffusion equation is reduced to the NS equation ∂ 2 u 3 /∂x 2 1 + ∂ 2 u 3 /∂x 2 2 = X P δ. That is to say, it is asymptotic-preserving to the hydrodynamic limit. Since the diffusion equation exchanges the information very efficiently, fast convergence and high accuracy in the near-continuum flow regime can be easily achieved by solving the gas kinetic equation (9)i n parallel with the diffusion equation (11). On the other hand, when δ is very small, i.e. the flow is highly rarefied, high-order moments will play significant roles. We assume X P =−1i n the following calculations.

The HDG method
The discontinuous Galerkin (DG) finite element method was initially introduced for the neutron transport equation [37]. In the last few decades, after its success in solving nonlinear hyperbolic conservation laws and many convection-dominated problems [38,39], it is recognized as one of the most promising methods for next generation computational fluid dynamics. Similar to the FVM, DG methods assume discontinuous solution space, where the resulting equations are closed by approximation of the numerical flux on the cell interfaces. Instead of reconstructing the solution on large stencils, high-order spatial accuracy of the DG solution is sought by means of element-by-element polynomial functions. The compactness and their discontinuous nature make the methods ideal for parallelization and the implementation of hp-adaptive schemes.
In recent years, DG methods have been applied to the gas kinetic model equations [40], and the linearized/full Boltzmann equations [41][42][43]f o r the simulation of non-equilibrium gas flows. For kinetic model equations, it has been shown that the second-order DG discretization combined with the explicit Runge-Kutta time marching is more efficient than the second-order FVM scheme [40]. Despite these advantages, classical DG methods are computationally more expensive than their continuous Galerkin counterparts for steady problem or implicit scheme. This is mainly due to the large number of degrees of freedom in approximating field variables resulting from the discontinuous nature. This shortcoming is enlarged when solving the diffusion equation, where additional auxiliary variables are introduced to approximate the derivatives of the solution [44]. The HDG method was proposed to overcome this disadvantage [45]. By producing a final system in terms of the degrees of freedom interpolating traces of the field variables, HDG could significantly reduce the number of global unknowns, since the traces are defined on cell interfaces and single-valued. Therefore, the HDG method is more appropriate for steady and implicit solvers. This advantage is prominent for the gas kinetic simulation, where a cumbersome system of governing equations needs to be solved. The majority of HDG applications in fluid dynamics to date includes convection-diffusion flow [45], stokes flow [46], wave propagation problem [47] and incompressible/compressible NS flows [48][49][50]. In this paper, for the first time, the HDG method is designed for gas kinetic equations, which will be detailed below.

Hybridizable discontinuous Galerkin formulation
We apply the discontinuous Galerkin method to discretize the system in the spatial space. Let ∈ R 2 be a twodimensional domain with boundary ∂ in the x 1 -x 2 plane. Then, is partitioned into M el disjoint regular triangles i : The boundaries ∂ i of the triangles define a group of M fc faces Ŵ c : The HDG method provides an approximate solution to h j on i as well as an approximation to its trace ĥ j on Ŵ c in some piecewise finite element spaces V × W in the following forms: where P k (D) denotes the space of k-th order polynomials on the domain D, as shown in Fig. 1(a). Before describing the HDG formulation, we first define a collection of index mapping functions, namely σ and η that relates the local edge of a triangle ∂ e i to a global face Ŵ c [51]. Since the e-th edge of the triangle ∂ i is the c-th face Ŵ c , we set σ (i, e) = c so that . Similarly, since the interior face Ŵ c ∈ Ŵ\∂ is the intersection of the two triangles, namely left triangle i − and right triangle i + , we set η(c, +) = i + and η(c, −) = i − , then we denote Ŵ c = ∂ η(c,+) ∩ ∂ η(c,−) . At the boundary face Ŵ c ∈ ∂ , only the right triangle is involved. The mapping functions are illustrated in Fig. 1(b).

Formulation of HDG method
The HDG method solves the problem in two steps [45]. First, a global problem is set up to determine the trace ĥ j on the face Ŵ. Then, a local problem with ĥ j as the boundary condition on ∂ i is solved element-by-element to obtain the solutions for h j . Generally speaking, when moving from the interior of the triangle element i to its boundary ∂ i , ĥ j defines what the value of h j on the boundary should be. In the HDG method, it is assumed that ĥ j is singled-valued on each face. Introducing (·) and · as (a, b) D = D⊂R 2 (a · b)dx 1 dx 2 and a, b D = D⊂R 1 (a · b)dŴ, respectively, the weak formulation of Eq. (9)f o r the VDF h j in each element i is: where n is the outward unit normal vector, s j = L j − X P v j 3 3 , and F is the numerical trace of the flux defined as [52]: where α is a stabilization parameter on each edge ∂ e i calculated as [48]: On inserting Eq. (16)i n t o Eq. (15), we can express the solution of h j on each triangle as a function of ĥ j . In matrix form, it is written as where H i, j and Ĥ i, j are the vectors of degrees of freedom of h j and ĥ j on i and ∂ i , respectively. The coefficient matrices A i, j , S i, j and Â i, j are given in the Appendix A.
The global problem, used to determine ĥ j , is obtained by imposing the continuity of the normal flux at cell interfaces. For all ψ ∈ W, the weak formulation is: where F · n η(c,+) and F · n η(c,−) denote the numerical fluxes calculated from the left and right triangles, respectively, and Ĝ · n is the flux defined over the boundary ∂ flowing into the computational domain. By inserting the definition of the numerical flux (16), we obtain the following matrix system for the global problem: where Ĥ c, j is the vector of degrees of freedom of ĥ j on Ŵ c . Other coefficient matrices are given in the Appendix A.
After eliminating the unknowns H i, j with Eq. (18) and assembling Eq. (20)o v e r all the faces, the global problem becomes: where Ĥ j is the vector of degrees of freedom of ĥ j on all the faces Ŵ, K j is the global matrix of the linear system of equations. Once the values of ĥ j are obtained, an element-by-element reconstruction of the approximation of h j is implemented according to Eq. (18).

Strategy to solve the large sparse linear system
It is noted that the linear system (21)i s highly sparse, in which only face unknowns that involve two adjacent triangles are coupled at each row. Compared to a standard linear DG system, the trace system is much smaller and sparser when k > 1 [ 53]. However, for large-scale applications, the trace system is still the major bottleneck for efficient computation, so linear iterative solvers might be suitable [54,55].
However, when solving the gas kinetic equation, we notice that the global matrices K j remain unchanged during all iterations. Thus, in this paper we still employ the direct solver for the linear systems (21). First, we assemble the trace systems for all discrete velocities into a large one as: Second, the LU-decomposition is completed before iteration. Third, at each iterative step, the traces are obtained by direct substitution. Both the LU-decomposition and substitution phases are executed by calling the large-sparse linear solver PAR-DISO [56]. Note that the computational cost in terms of the CPU time listed in Sec. 4 only counts the computational time used in iterations, that is, the time to set up and factorize the global system is not included, which is negligible compared to the overall iteration time.

Implementation of boundary condition
Before describing the implementation of the boundary condition, we take an insight into the form of the numerical flux. If inserting the expression of flux (16)i n t o the continuity equation (19)a t interior faces, we immediately have: That is, the trace ĥ j at the interior face is equal, in a weak sense, to the average of h j η(c,+) and h j η(c,−) , which are evaluated at the interface from the left and right triangles, respectively. Then we obtain an equivalent expression for F · n: which is exactly the upwind scheme.
The flux Ĝ · n needs to be specified at the boundary ∂ to complete the formulation. To be consistent with the evaluation of fluxes at interior faces, we calculate the boundary flux as: where g j is the boundary value of h j and n is the outward unit normal vector at the boundary pointing into the flow field. In this paper, the fully diffuse boundary condition is used to determine the perturbed VDF g j at the solid surface. Suppose the solid wall is static and has the temperature T 0 , the perturbed VDF for the reflected molecules at the wall (i.e., which is always zero in this specific problem due to the Other types of boundary conditions, such as the diffuse-specular boundary condition with tangential momentum accommodation coefficient less than one, symmetry/periodic boundaries, as well as pressure inlet/outlet boundary could be incorporated straightforwardly [40,30].

HDG for the synthetic equation
The HDG method for solving the diffusion equation (11)h a s been well developed [45,52], in which two auxiliary variables are introduced to approximate the derivatives of u 3 , thus the HDG approximation is synchronously taken for the flow velocity u 3 , its derivatives ∇u 3 , and its trace û 3 . Here, we skip the details of the scheme, and discuss several modifications tailored for the current problem.
First, since the second-order partial derivatives of the high-order moments appear in the equation (11), we rewrite the equation into a first-order system in the form as: where the vector r is That is, the auxiliary variable q is introduced to approximate the combination of the derivatives of u 3 and high-order moments, which guarantees the stability and local solvability. Then, the flow velocity u 3 and its trace û 3 , as well as the vector q are discretized in the piecewise finite element Second, to specify the boundary condition of û 3 , we evaluate it from the perturbed VDF as: This could guarantee the proper value of the flow velocity at the boundary, especially when the slip velocity at the solid surface is large for highly rarefied gas flows. The procedure of the SIS for the linearized BGK equation is described as follows: • When h j,(t) and u (t) 3 are known at the t-th iteration step, calculate the VDF h j,(t+1) at (t + 1)-th step by solving the conventional iteration scheme (9), with the right-hand side given by 2δu 3 v j 3 3 + X P v j 3 3 . Note that the variations in density and temperature are zero in this specific flow; • From h j,(t+1) , calculate the vector r, i.e. the high-order moments F 2,0,1 , F 1,1,1 and F 0,2,1 in Eq. (27); • From h j,(t+1) , calculate the flow velocity trace û (t+1) 3 at boundary, see Eq. (28); by solving the synthetic diffusion equation (26), with the boundary condition obtained from the previous step.
The above iterative procedure is continued until the steady-state is reached. For the following calculation, the stabilization parameter appears in the expression of the numerical flux for q · n (Eq. (8) in Ref. [52]) is set to be 1.

Results and discussions
The HDG method of k up to 4 is applied to solve the linearized BGK kinetic model equation (9)i n parallel with the synthetic diffusion equation (11). The convergence criterion for the iterative procedure described in Sec. 3.2 is that the global relative residual in flow velocity between two successive iteration steps is less than 10 −5 . The residual is defined as In addition to the profiles of flow velocity, we are interested in the property of dimensionless mass flow rate (MFR): To assess the accuracy and efficiency of the proposed scheme, our numerical results are compared with the discrete UGKS (DUGKS) solutions, which have been verified from the continuum to free-molecular flow regimes [22], or available data from literature. In the four test cases below, the convergence tests in terms of the discrete velocities are performed first to determine the number of points in the molecular velocity space: the convergence is said to be reached if further refinement would only improve the solutions by a magnitude no more than 0.5%. The entire tests are done in double precision on a workstation with Intel Xeon-E5-2680 processors and 132 GB RAM. During iteration, we call the relative routines in Intel Math Kernel Library (MKL) to invert the matrix. Moreover, to solve the HDG global equation, we call the direct sparse solver, PARDISO. The first two tests are done on single processor, and the internal parallelism for MKL functions is not activated. The last two simulations are run on multiple processors using OpenMP.

Fast convergence of the SIS: Poiseuille flow between two parallel plates
Poiseuille flow between two parallel plates with a distance of H is used to assess the accuracy and fast convergence of the proposed HDG solver. The one-dimensional flow is resolved on a two-dimensional domain of =[0, 0.5] ×[0, 1.0] with 4 uniform isosceles right triangles being set along the direction perpendicular to the plates, say, the x 2 direction. Therefore, the height of each triangle is equal to 0.354, which is larger than the mean free path when Kn < 0.354 or equivalently δ>2.50.
The MFR at different rarefaction parameter δ, obtained from the SIS with k = 3, is illustrated in Fig. 2(a) and compared with those from the DUGKS and the CIS. In the CIS, only the linearized BGK model equation (9)i s solved. 24 non-uniform points are employed with a truncation of [−4, 4] in each direction to discretize the molecular velocity space. Simulation parameters including the numbers of grid points employed in the DUGKS could be found in Ref. [22]. It is shown that when δ increases, the MFR first drops to the minimum value at δ ∼ 1 and then rapidly increases. The Knudsen minimum in the mass flow rate is due to the competition of two effects: when Kn increases, the slip velocity at the plates becomes larger, while the velocity profile becomes flatter [10]. The SIS could obtain MFRs with high accuracy on such a coarse grid over a wide range of flow regimes. The relative L 2 errors of the SIS results to the ones of the DUGKS are within 1.1%. However, the CIS results possess obvious errors when δ 150. For example, the MFR from the CIS is about 61.7% smaller than that of the DUGKS at δ = 886.2. This is due to the fact that the spatial resolution is too low such that the numerical viscosity is not negligible in comparison with the physical viscosity of the gas in the CIS, while in the SIS the macroscopic diffusion equation (11)i s solved with the physical viscosity.
Another superiority of the SIS over the CIS is immediately seen from Fig. 2(b), which shows the iteration steps to reach the steady-state solution for both CIS and SIS. When the CIS is used, the number of iteration steps increases rapidly with the rarefaction parameter in the near-continuum flow regime (δ ≥ 10), whereas those of the SIS only increases slightly. In the late transition flow regime (δ<1), however, the numbers of iterative steps are almost the same for both schemes. This is further confirmed in Table 1, where the relative L 2 error of MFRs (the DUGKS results are used as the reference solutions), the number of iteration steps, and the total CPU time are listed for various rarefaction parameters δ and degrees of approximation polynomials in the HDG method. It is interesting to note that with the same number of triangles, the number of iterative steps of the CIS reaches a constant value as the degree of polynomials in the HDG discretization increases. While at large δ, the number of iterative steps of the SIS drops when higher-order approximation polynomials are employed. Compared to the kinetic equation, the time to solving the fluid-dynamic equation (11)i s negligible, so the CPU time saving is proportional to the reduction of iteration steps. Therefore, the SIS needs significantly less time to reach converged solutions than the CIS. At δ = 8.862, the SIS with k = 4i s 10 times faster than the CIS, while at δ = 88.62 it is nearly 140 times faster.  To illustrate how the SIS works in the near-continuum flow regime, convergence histories of the SIS and CIS are plotted in Fig. 3 when δ = 88.62. Staring from the zero disturbance, the flow velocity gradually increases from zero due to the gas-gas and gas-surface collisions. From Fig. 3(a) we see that, near the wall the flow velocity quickly approaches the converged value, while the velocity in the bulk adjusts rather slowly. As a result, a large number of iterations is required in the CIS to promote the flow velocity reaching to the maximum value. However, this situation is completely changed in the SIS, where the macroscopic diffusion equation (11)q u i c k l y generates the parabolic velocity profile (the second-order derivative ∂ 2 u 3 /∂x 2 2 is very close to −δ) in the bulk, which boots the convergence significantly. From Fig. 3(b) it is found that the velocity profile of the SIS is already very close to the final converged solution, even after two iterations.

Comparison of HDG, FDM and DG: flow along a channel of square cross-section
The performance of the HDG-SIS is now assessed in the Poiseuille flow along a channel with a square cross-section of height H , by comparing with solutions obtained from the same SIS but with the second-order FDM for the approximation of spatial derivatives [12]. The flow is resolved on a domain of =[0, 1] ×[0, 1]. As shown in Fig. 4(a), the computational domain is partitioned with uniform triangles. For the discretization of the velocity space, 24 × 24 × 24 non-uniform points are used with a truncation of [−4, 4] in each direction. The typical flow velocity contours obtained by the HDG-SIS at δ = 100, 10 and 1 are shown in Fig. 4(b)-(d), respectively. It is observed that the maximum velocity emerges in the center of the flow field. As the rarefaction parameter δ decreases from 100 to 1, the maximum velocity reduces while the slip velocity in the vicinity of solid surfaces increases.  For the HDG-SIS, the L 2 errors of the MFR, the numbers of iterative steps, and the CPU time to obtain the converged solutions are listed in Table 2, for various numbers of triangles and degrees of approximation polynomials. The results obtained by the FDM-SIS are also listed in Table 3, where M p denotes the number of equally-distributed discrete points in the spatial space. The L 2 errors are calculated using the DUGKS results as reference. For the DUKGS simulations, the same discrete velocity grid as that in the SIS is employed, while 48 × 48 and 72 × 72 points are located in the spatial space for cases with δ<10 and δ ≥ 10, respectively. Table 3 Poiseuille flow along a channel of square cross-section solved by the FDM-SIS. M p is the number of discrete points in the spatial space, Itr is the number of iteration steps to satisfy the convergence criterion R < 10 −5 , and t c is the CPU time.  It is found from Table 2 that for the spatial grids with the same number of triangles, the HDG-SIS solutions with higher-order accuracy are obtained with higher-order approximation polynomials. Therefore, to achieve the same order of accuracy, solvers with higher-order polynomials require spatial grids with fewer triangles. For example, when δ = 100, the solver with 3 rd -order polynomials has an error of about 0.8% in the MFR using 18 triangles, while the one with 4 th -order polynomials reaches this accuracy with only 8 triangles. Moreover, as the rarefaction parameter decreases, fewer triangles are needed to obtain high-accuracy results. As far as the convergence speed is concerned, for all the rarefaction levels, solvers with different degrees of polynomials require almost the same number of iterations to obtain solutions with the same order of accuracy. For example, when δ = 100, about 50 iterations are required to obtain MFR with L 2 error less than 1%. Since fewer triangles are needed, the higher order the solver, the less the CPU time. At δ = 100, the CPU time to obtain solution with an error of ∼ 0.8% with k = 4i s about 70% of that with k = 3.
For the comparison of the HDG-SIS and the second-order FDM-SIS, we find that the HDG discretization is much more efficient. At δ = 100, the FDM-SIS predicts the MFR with an error less than 1% on the spatial grid with 55 × 55 discrete points, while the HDG-SIS obtains the solution with the same order of accuracy on 50, 18 and 8 triangles for k = 2, 3 and 4 solvers, respectively. Meanwhile, at δ = 1, the FDM-SIS obtains the MFR with an error less than 1% on 35 × 35 points, while the HDG-SIS obtains the solution only on 2 triangles for all solvers. The HDG solver of k = 4 could be 9.3t i m e s (more than 12 times) faster than the FDM solver to obtain converged results at δ = 100 (δ = 10). Fig. 5 shows the convergence history of the MFR in terms of the CPU time for the FDM-SIS and HDG-SIS with k = 4. It is clear that the HDG method is significantly more efficient especially for highly rarefied flows. Although higher-order FDM could achieve better efficiency, it needs much more computational effort since stencils involving a large number of points are required in the FDM scheme, which is extremely difficult to be implemented for complex geometries.
It is also interesting to compare the performance of the implicit HDG solver and the original DG method. The comparison is based on the CIS scheme for the flow at δ = 10. Here, we consider two different DG methods, one is based on the implicit iterative scheme like the one used in the current HDG solver, the other is the explicit Runge-Kutta DG (RKDG) gas kinetic solver [40]. The detailed formulation of the implicit DG (IDG) scheme is presented in the Appendix B. The number of degree of freedom N dof , the L 2 errors of the MFR, the numbers of iterative steps, iterative time interval t in the RKDG solver, and the CPU time are listed in Table 4 for the solvers with different approximating polynomials. Our numerical results show that the implicit HDG and DG with the same order on the same mesh yield the same solution (MFRs have at least 7 same significant digits) using the same number of iterative steps, since the numerical flux in the HDG is equivalent to the upwind scheme used in the IDG. Note that N dof refers to the number of degrees of freedom that appear independently in each scheme. For the HDG solver, it is calculated based on the trace unknowns as N dof = M v M fc (k + 1), while for the IDG or RKDG, it is based on the field unknowns as N dof = M v M el (k + 1)(k + 2)/2. For the structured triangular mesh used, the Table 4 Comparison between the HDG-CIS, IDG-CIS and RKDG-CIS for Poiseuille flow along a channel with square cross-section at δ = 10, where Itr is the number of iteration steps to satisfy the convergence criterion R < 10 −5 , N dof denotes the number of degree of freedom involving in each method, t is the iterative time interval and t c is the CPU time. Note that Itr and L 2 error in the HDG-CIS and IDG-CIS are exactly the same, while N dof for IDG-CIS and RKDG-CIS is the same.  number of faces M fc is around 1.7 times the number of triangles M el . Therefore, it is found that N dof in HDG is smaller than the one in the original DG scheme when k ≥ 2. The higher order and more triangles, the more significant this difference will be. Actually, N dof of the HDG becomes closer to that of the continuous higher-order FEM [53]. As a consequence, the HDG method costs less CPU time than the IDG method when k > 2. Although N dof of the HDG with k = 2i s smaller than that of the IDG, it is not more efficient. This is due to the fact that extra time is required to recover field solution from trace solution in the HDG. Another advantage of the HDG due to reduction in number of coupled degrees of freedom is to promote further parallelism. Compared to the HDG or IDG method, the RKDG solver needs more CPU time to obtain converged solutions. When the order of polynomials and the number of triangular elements increase, the number of iterative steps in the HDG or IDG solver approaches a fixed value (around 141 for the current case). For the RKDG solver, the iteration number increases substantially, as its time interval is restricted by the Counrant-Firedirchs-Lewy condition, i.e. |v j | max t ≤ CH min , where H min is the minimum altitude of triangles. We have set the constant C = 0.3f o r k = 1 and 0.2 for k = 2, respectively. Therefore, the implicit solvers are more efficient. To obtain the solution with L 2 error less than 1% on 128 triangles, the HDG solver with k = 1 could be 10 times faster than the RKDG solver.

Accuracy of the SIS: flows along the channels of various cross-sections
We further assess the accuracy of HDG-SIS by solving the Poiseuille flows along the channels of triangular, trapezoidal, and circular cross-sections, and comparing the MFRs to existing data. The geometries and meshes are illustrated in Fig. 6.   The velocity contours at δ = 100, 10, and 1a r e shown in Fig. 7. Similar to flows in the square channel, the maximum velocities appear in the center of the flow field, which decrease as the rarefaction parameter decreases. MFRs over a wide range of gas rarefaction are plotted in Fig. 8. It is found that MFRs for the triangular and trapezoidal channels are close to each other due to the fact that the hydraulic diameter is chosen as the characteristic length to non-dimensionalize the problem. If using the radius as the characteristic length, the MFR in the circular channel is larger than those in the other two channels. The Knudsen minimum again arises at δ ∼ 1. In all cases, the HDG-SIS results agree well with those in the literature [57,58], which demonstrates the accuracy of the proposed HDG-SIS scheme. It is worth to mention that the results in the literature were calculated from the linearized Shakhov kinetic model equation, where the additional correction of the heat flux in the collision operator has no effect on the MFR in this problem.

Capability to handle complex geometry: flows along Apollonian gasket channels
Finally, the HDG-SIS is applied to calculate the Poiseuille flows along the channels with cross-section described by the Apollonian fractal gasket, to demonstrate its capability of handling complex geometries. The cross-section of the original Apollonian fractal gasket is a fractal generated starting from a circle, which is filled by three circles with the same radius, each is tangent to the others (including the internal tangent with the outer circle, see Fig. 9(a)). Then for the next level, the structure is filled with three more circles, each is tangent to another three, see Fig. 9(b). Here, for the geometry we calculated, the inner circles are not tangent to anyone of the others, while their centers coincide to those in the original Apollonian gaskets and their radii are determined such that the porosity (the ratio of void area) is 0.7 for the first level and on this basis, the porosity of the second level is 0.65. The resulting geometries and meshes of the Level-1 and Level-2 structures are presented in Fig. 9(c)-(d). In the current simulation, we treat the inner small circles as solids and the fluid flowing through the gaps between the outer circle and the inner ones. To determine the rarefaction parameter, the radius of the outer circle is set as the characteristic length. Total 494 and 1082 triangles are employed in the spatial discretization and the velocity grid is the same as the previous test. Here the regular triangle elements without distortion are used. Therefore, the curved boundaries are approximated by line segments. Note that elements allowing curved boundaries could be used to achieve lower geometrical discretization error on mesh with fewer triangles. Fig. 10 displays the velocity contours in the different geometries with varying rarefaction parameters, where the velocity distributions possess an axial symmetry. When there is no solid inside the outer circle (the last row in Fig. 7), the maximum velocity is at the center of the domain. However, for the Level-1 geometry, the large flow velocities move along the radial direction to the outer boundary. While for the Level-2 geometry, the large flow velocities emerge in the center of the domain again. Fig. 11 shows the MFRs in the Poiseuille flow through the Apollonian gasket channels together with the one through the circular channel. As the recursion level increases, the porosity of the Apollonian gasket channel decreases, so as the MFR. The Knudsen minimum in the MFR can be seen, however, the location of the minimum MFR shifts towards larger values of δ in the Apollonian gasket channels compared to the one in the circle channel. This is because, in the calculation of δ the characteristic flow length H is selected to be the radius of the outer circle, which is larger than the radius of the solid near which the flow velocity is maximum.

Conclusions
In summary, based on the high-order HDG discretization, we have developed an accurate and efficient numerical method to find the steady-state solution of the linearized BGK model equation, for rarefied Poiseuille gas flow through channels with cross-sections of arbitrary shapes. First, an HDG solver with approximation polynomial of degree up to 4 has been developed. The velocity distribution functions and their traces are approximated on arbitrary triangular mesh and the mesh skeleton, respectively. Based on the first-order upwind scheme, a numerical flux has been designed to evaluate the convection between adjacent cells. By imposing the continuity of the normal flux, a final global system for traces of the velocity distribution function is obtained. Since the traces are defined on the cell interfaces and have single-values, the global coupled degrees of freedom of the unknowns are significantly reduced compared to the classical discontinuous Galerkin method. The boundary condition has been implemented in a unified framework, the same as the calculation of flux on interfaces.
In parallel to the HDG solver for the gas kinetic equation, a macroscopic diffusion equation for flow velocity is synchronously solved on the same mesh. At each iterative step, the collision operator L in the bulk region is corrected by the flow velocity from the diffusion equation. Since the macroscopic equation boosts the exchange of information, fast convergence with asymptotic-preserving into the hydrodynamic limit is realized for the steady-state solution in the near-continuum flow regime. On the other hand, high-order moments of the velocity distribution function in the diffusion equation preserve the accuracy of the scheme in highly rarefied gas flows. Four different problems of Poiseuille flow along channels with various cross-sections have been presented to show accuracy and capability of the proposed scheme. Several conclusions are summarized through the performance analysis: • Compared to the conventional iterative scheme, the synthetic iterative scheme can significantly reduce the number of iterative steps to reach the steady-state solution in the near-continuum flow regime: the synthetic iterative scheme could be more than 100 times faster.
• To obtain the results with the same order of accuracy, the HDG solver with higher degree of approximation polynomial requires fewer triangles in spatial discretization. As a result, the computational time and memory consumption can be further reduced.
• Compared to the synthetic iterative scheme solved by the finite difference method, the HDG discretization is much more efficient. To obtain the results with the same order of accuracy, the HDG scheme can be faster than the finite difference method by one order of magnitude.
• Compared to the implicit discontinuous Galerkin scheme, the HDG solver is more efficient when the degree of approximating polynomial is larger than 2, since the number of coupled degrees of freedom is reduced. Compared to the explicit Runge-Kutta discontinuous Galerkin gas kinetic solver, the HDG solver requires significantly fewer iterative steps thus less CPU time to obtain converged solutions.
It is worth mentioning that the basic HDG formulation developed in this paper is not limited to the linearized BGK equation. It is straightforward to be extended for other gas kinetic model equations, or even the full Boltzmann equation by adopting a proper method (e.g. fast spectral method [59] and conservative projection method [60]) to calculate the Boltzmann collision operator. Since the computational cost of the Boltzmann collision operator is much higher than that of the gas kinetic models, and the HDG with higher degree of approximation polynomial can reduce the spatial triangular meshes (and hence the nodal points where the Boltzmann collision operator is evaluated), the advantage of using HDG method will become more pronounced. Also, the HDG-SIS is ready to be extended for the simulation of rarefied gas mixtures. where Ĥ j is the vector of nodal value of ĥ j that is the sum of all the faces in the computational domain, and where A is the conventional assembly operator. To obtain the global matrix K j and vector R j , the dense matrices A i, j with dimension K el × K el for each i = 1, ..., M el , j = 1, ..., M v need to be inverted. Then, the sparse unsymmetrical linear system (A.7)i s directly solved to determine Ĥ j . Finally, H i, j is updated in an element-by-element fashion according to Eq. (A.4).

Appendix B
We present the formulation of the implicit DG method for Eq. (8). Unlike the HDG method, the original DG method only provides the approximate solution to h j on i in the piecewise finite element space V. At each iterative step, the following weak formula is solved to determine the field unknowns: − ∇ϕ, v j h j i + 3 e=1 ϕ,F · n ∂ e i + ϕ,δh j i = ϕ, s j i , for all ϕ ∈ V, where the numerical flux is defined based on the upwind scheme: Assembling the weak formulation over all triangles, we obtain the global system in the following matrix form: where H j is the vector of the nodal value of h j that is the sum of all the triangles in the computational domain, and A ext(i,e), j , with N l ext(i,e) being the nodal function in the neighboring triangle. Here, A is the conventional assembly operator. To solve the linear system, direct solver is applied. The global matrices K j remain unchanged during all iterations. Before iteration, the coefficient matrices K j are assembled for all discrete velocities and the LU-decomposition is conducted. At each iterative step, h j are obtained by direct substitution. Both the LU-decomposition and substitution phases are executed by calling PARDISO.