A comparative study of discrete velocity methods for low-speed rarefied gas flows

In the study of rarefied gas dynamics, the discrete velocity method (DVM) has been widely employed to solve the gas kinetic equations. Although various versions of DVM have been developed, their performance, in terms of modeling accuracy and computational efficiency, is yet to be comprehensively studied in all the flow regimes. Here, the traditional third-order time-implicit Godunov DVM (GDVM) and the recently developed discrete unified gas-kinetic scheme (DUGKS) are analysed in finding steady-state solutions of the low-speed force-driven Poiseuille and lid-driven cavity flows. With the molecular collision and free streaming being treated simultaneously, the DUGKS preserves the second-order accuracy in the spatial and temporal discretizations in all flow regimes. Towards the hydrodynamic flow regime, not only is the DUGKS faster than the GDVM when using the same spatial mesh, but also requires less spatial resolution than that of the GDVM to achieve the same numerical accuracy. From the slip to free molecular flow regimes, however, the DUGKS is slower than the GDVM, due to the complicated flux evaluation and the restrictive time step which is smaller than the maximum effective time step of the GDVM. Therefore, the DUGKS is preferable for problems involving different flow regimes, particularly when the hydrodynamic flow regime is dominant. For highly rarefied gas flows, if the steady-state solution is mainly concerned, the implicit GDVM, which can boost the convergence significantly, is a better choice. © 2017 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
Multi-scale flows, where different temporal and spatial scales are presented, are often found in nature and engineering, which represent a modeling and simulation challenge. The gas flow at different scales can be categorized by the Knudsen number ( Kn ), defined as the ratio of the mean free path of gas molecules to the characteristic length of the flow field. It is well recognized that the computational fluid dynamics based on the Navier-Stokes (NS) equations and the direct simulation Monte Carlo (DSMC) method [1] are two dominant methods for the efficient and accurate simulation of the hydrodynamic ( Kn < 10 −3 ) and rarefied gas (transition, 0.1 < Kn < 10; free molecular, Kn > 10) flows, respectively. However, in the slip regime ( 10 −3 < Kn < 0 . 1 ), the NS solvers and the DSMC method become either inaccurate or inefficient: the NS equations are inappropriate to describe rarefied (non-equilibrium) gas flows because they are derived based upon the near equilibrium hypothesis, while the particle nature of the DSMC method provide accurate numerical results, which can serve as the reference solutions. However, the high computational cost in calculating the complicated collision operator makes them impractical for many applications [8] . Therefore, the Boltzmann equation is usually replaced by simplified kinetic model equations, such as the Bhatnagar-Gross-Krook (BGK) [22] , ellipsoidal statistical (ES) [23] , and Shakhov [24] models. And most of DVMs are developed for these Boltzmann model equations.
In the traditional DVM, the Boltzmann model equation is explicitly solved through the operator splitting method [15] , where the time step and cell size are limited by the mean collision time and mean free path of gas molecules, respectively. Consequently, like the DSMC method, the DVM works well for highly rarefied gas flows, but encounters great difficulties for near-continuum flows [25,26] . Some semi-implicit and implicit DVMs have been developed to remove the restriction of the time step and improve the efficiency [10,27,28] .
In order to develop an efficient DVM for all the flow regimes, significant effort has been made recently to develop the asymptotic preserving (AP) schemes [11,[27][28][29][30][31] . An AP method is stable with respect to Kn , and when Kn is very small, it is consistent with the Chapman-Enskog representation in the continuum limit [8,31] . Therefore, the AP property is critical to a multi-scale method. Unfortunately, most AP schemes can only recover the Euler solutions in the hydrodynamic limit, except for the recently developed unified gas-kinetic scheme (UGKS) [25,26,[31][32][33] and the discrete unified gas-kinetic scheme (DUGKS) [34][35][36][37][38][39] , which recover the NS solutions. Both the UGKS and DUGKS share the same merit that the molecular transport process is coupled with the molecular collision, so that the time step and mesh size are independent of the collision time and the mean free path, respectively [25] .
The main difference between the UGKS and DUGKS lies in the construction of the distribution function across the cell interface: the UGKS uses the local integration solution of the kinetic model, while the DUGKS adopts its discrete characteristic solution, thereby avoids computing the complicated gradients of macroscopic variables. Also, owing to auxiliary functions introduced, the DUGKS only updates single distribution function in the evolution process, while in the UGKS macroscopic variables and distribution function are updated within one time step. Therefore, the DUGKS is better than the UGKS in terms of simplicity and efficiency, while their accuracies are at the same level [40,41] .
So far, the DVM can be roughly classified into two types: the traditional DVM and new AP DVM. The detailed comparison of these two methods will provide essential information for users to choose the appropriate one for applications. In this paper, we will perform a comparative study of these two type DVMs in all the flow regimes, aiming to clarify their applicability for different flow problems. It is usually recognized that it is not easy for a secondorder accurate traditional DVM to simulate the continuum flow due to the limitations of mesh size and time step, hence a thirdorder accurate time-implicit Godunov DVM (GDVM) [10] is adopted here in all the flow regimes including the hydrodynamic regime. On the other hand, it has been demonstrated that the DUGKS, as a newly developed AP DVM, can dynamically describe flows from the free molecular to hydrodynamic regimes and simultaneously preserve a second-order accuracy in both the spatial and temporal spaces [36,40,42] . Although the two methods are derived from the same model equation, different algorithms will lead to solution discrepancy. In this work, we will analyze these two typical DVMs in terms of accuracy and efficiency.
The remaining part of this paper is organized as follows. We first make a brief introduction of the time-implicit GDVM and DUGKS, as well as an analysis of both methods in Section 2 . The detailed comparison of these two methods regarding accu-racy and efficiency is given in Section 3 , followed by conclusions in Section 4 .

Numerical methods
In this section, the GDVM [10] and DUGKS [35] are used to solve the Shakhov model equation for monatomic gases [24] .

The Shakhov model
In the absence of external force, the Shakhov kinetic model equation can be written as where f = f (x , ξ, t ) is the velocity distribution function of gas molecules with the molecular velocity ξ = (ξ x , ξ y , ξ z ) at the position x = (x, y, z) and the time t , and f S is the reference equilibrium distribution function expressed by the Maxwellian distribution function f eq and a heat flux correction term: where Pr is the Prandtl number, c = ξ − U is the peculiar velocity with U being the macroscopic flow velocity, q = 1 2 cc 2 f d ξ is the heat flux, R is the specific gas constant, and T is the temperature of the gas. The collision time τ in Eq. (1) is related to the dynamic viscosity μ and pressure p by τ = μ/p. The Maxwellian distribution function f eq is given by where ρ is the gas density. The conservative variables W ≡ ( ρ, ρU , ρE ) T are calculated from the velocity moments of the distribution function: where ψ = 1 , ξ, 1 2 ξ 2 T and ρE = 1 2 ρU 2 + 3 2 ρRT is the total energy.
Since only two-dimensional (2D) problem is considered in this work, two reduced velocity distribution functions are introduced to cast the three-dimensional molecular velocity space into 2D [10] : For convenience, in what follows we denote ξ = (ξ x , ξ y ) and x = (x, y ) . Thus, based on g and h , we can compute macroscopic variables by The governing equations for the two reduced distribution functions can be deduced from Eq. (1) as where the reduced reference distribution functions g S and h S are h eq = RT g eq , (9b) It is clear that the updating rules for g and h in Eq. (8) have the same structure where the generic symbol φ is used to denote g or h .
Note that the dynamic viscosity μ for the hard-sphere (HS) or variable hard-sphere model (VHS) is Using the Knudsen number ( Kn ), Mach number ( Ma ) and Reynolds number ( Re ), which are respectively defined as and they are related by where γ is the specific heat ratio, L ref , U ref and ρ ref are the reference length, velocity and density, respectively.

The traditional discrete velocity method
The traditional DVM we adopt here is also based on Eq. (10) which is discretized in time by the fully time-implicit Godunov-type scheme [10,43] : 1 t n + ξ · ∇ + 1 τ n φ n = RHS n , where φ n = φ n +1 − φ n needs to be determined at each time step.
The right-hand side RHS n of Eq. (15) is the explicit part, where the spatial derivative is approximated by the third-order upwind scheme. In this work, the derivative with respect to the mesh point On the other hand, the left-hand side of Eq. (15) is the implicit part, where the spatial derivative is approximated by the firstorder upwind scheme. By marching in appropriate direction, e.g.
increasing x in the case of ξ x > 0, the unknown φ n can be obtained directly without iterations. Note that t in Eq. (15) is a pseudo-time step that is defined by the Courant-Friedrichs-Lewy (CFL) condition i.e., t = η x min /ξ max , where η is the CFL number, x min is minimum grid spacing, and ξ max is the maximum discrete velocity. However, here the CFL number η can be smaller than 1 to capture the transient behavior, it can also be set as large as 10 4 for steady-state flow problems.

Discrete unified gas-kinetic scheme
The DUGKS is an explicit finite-volume method to solve Eq. (10) . The computational domain is first divided into some control cells; then integrating Eq. (10) in a cell V j (centered at x j ) from time t n to t n +1 ( t = t n +1 − t n ) , and using the trapezoidal and middle-point rules for the time integration of the collision and convection terms, respectively, we can obtain the evolution equation of DUGKS: are two auxiliary distribution functions, and is the micro-flux across cell interface, here | V j | and ∂V j are the volume and surface of the cell V j , n is the outward unit vector normal to the cell interface. Based on the conservative property of collision operators: and the heat flux from Therefore, in actual implementation, the evolution of ˜ φ is tracked according to Eq. (17) , instead of the original distribution functions φ, to avoid implicit computations.
The key procedure in updating ˜ φ is to evaluate the micro-flux F, which is solely determined by the gas distribution function  where time integration of the collision term is approximated by the trapezoidal rule. Again, in order to remove the implicity of Eq. (22) , two distribution functions are introduced Then Eq. (22) is expressed explicitly as where φ+ ,n is constructed as where σ j is the slope of φ+ in the cell j which is computed by the central difference method. Note that σ j can also be approximated by using some numerical limiters for discontinuous problems [40] . Once φ+ ,n is given, the original distribution function across the cell interface can be calculated from Eq. (23) : where φ S,n +1 / 2 (x b , ξ) is determined by the conserved variables and the heat flux on the cell interface x b and at the half time step t n +1 / 2 , which can be evaluated as Then the micro-flux can be computed by Eq. (26) . Finally ˜ φ at the cell center can be updated according to Eq. (17) . Note that the time step in the DUGKS is solely determined by the CFL condition. Both the GDVM and DUGKS presented above are based on continuous velocity space for convenience. In actual implementation, the continuous velocity space is discretized into a finite discrete velocity set { ξ i } the same as that of the traditional DVM [10] .
For example, in the DUGKS, the distribution functions i.e., ˜ g and ˜ h are approximated at these discrete velocity points as ˜ g i and ˜ h i . Proper quadrature rule, such as the Newton-Cotes and Gauss-Hermite quadratures, are adopted to approximate the moments, where ϖ i is the weight coefficients for the corresponding quadrature rule.

Analysis of the DUGKS and GDVM
Both the DUGKS and GDVM are derived from the same Boltzmann model equation, but different considerations in their algorithms determine their distinctive behaviors in flow simulations.
In the DUGKS, the flux is solely determined by molecular distribution functions across the cell interfaces, which is constructed on basis of the discrete characteristic solution of the kinetic model. Based on Eqs. (23) , (24) and (26) , it can be rewritten as At the right-hand side of Eq. (30) , the first and second terms represent the kinetic and hydrodynamic contributions, respectively. It indicates that the molecular transport process is coupled with molecular collisions when evaluating flux across the cell interface. In the continuum and near continuum flow regions, t / τ ≫ 1, thus, the flux computed from Eq. (30) is mainly contributed from the hydrodynamic scale solution; however, in highly rarefied flow regime, the molecular free transport mechanism will play an important role due to t / τ ≪ 1; in the transition regime, the time step t is comparable to τ , thereby both the kinetic and hydrodynamic physics are important. Therefore, with variation of the ratio of t / τ , the DUGKS can dynamically describe the flow from the free molecular to hydrodynamic regimes. It also has been demonstrated that with the coupled treatment of molecular transport and collision processes, the numerical dissipation in DUGKS is at In contrast, in the GDVM, the model equation is directly solved using the implicit finite-difference method, and the convection term is approximated by the upwind scheme, which means that molecules transport across two grid points freely. Therefore, for flow regimes in which mesh size is much larger than the mean free path, the use of upwind scheme is obviously inappropriate, since molecules will physically encounter many collisions when they transport such a long distance in a mesh size scale. Thus, the GDVM requires much finer mesh to resolve the flow in the near continuum regimes [26] . Note that the adoption of the third-order upwind approximation Eq. (16) for the convection term in the explicit part of Eq. (15) may improve the GDVM's performance in the continuum and near-continuum regimes. It is also noted that this finite-difference DVM com putes less equilibrium state distribution functions than the DUGKS with a finite-volume formulation, thus, with the same CFL number, the GDVM should be faster than DUGKS for each iteration. In addition, it should be bear in mind that the GDVM becomes an implicit method when using a larger CFL number ( η ≫ 1), and it will lead to fast convergence of the GDVM. Therefore, the DUGKS may work well in all the flow regimes, and the GDVM is preferable for highly rarefied flows, but may encounter great difficulty in the continuum and near continuum regimes. It should be noted that although the time step t in these two methods are both determined by the CFL condition, for GDVM, t is a pseudo-time step and has no contribution to the numerical error, thereby the results obtained by the GDVM with small CFL number and the implicit GDVM with larger CFL number have the same accuracy. The above points will be verified in the following simulations.

Force-driven Poiseuille flow
The performance of the GDVM and DUGKS is first evaluated by simulating the one-dimensional (1D) force-driven Poiseuille flow between two parallel plates with temperature T w , which are located at y = 0 and y = H. An external force is applied in the x - where F x is the force term. Suppose the magnitude of the external acceleration G is very small, the force term can be approximated where φ eq is formed in Eqs. (9a) and (9b) .
In the GDVM, Eq. (31) is directly solved by considering F x as a source term, while in the DUGKS, the Strang splitting method is used [44] : at the beginning of each time step, the distribution function ˜ φ n is updated within a half time step by ∂ t ˜ φ = tF x / 2 , and then the procedure of DUGKS is executed followed by updating ˜ φ n +1 within a half time step in the same way as that at the beginning of each iteration.
In our simulations, we use 10, 20, and 10 0 uniform mesh points between two parallel plates with the distance H = 1 . The gas flow from the highly rarefied to the hydrodynamic regimes (the Knudsen number from 10 to 10 −4 ) is simulated by varying the gas pressure. The diffuse boundary condition is applied on both the plates. The hard-sphere gas is considered, where the exponent ω in Eq. (11) is 0.5. As a matter of fact, when the magnitude of external force is small, the flow is nearly isothermal, so that the mass flow rate is not affected by the temperature-dependence of the shear viscosity. Our simulations start from a global equilibrium state. The convergence criterion for the steady-state is defined by The discretization of the molecular velocity space depends on the rarefaction level of the gas flow. In this study, we focus on  the low-speed flows, so for the near hydrodynamic regime, the highly accurate Gauss-Hermite integration with fewer discrete velocity points is usually applied, while the Newton-Cotes formulas with more discrete velocity points could be adopted to capture discontinuities in the distribution function in highly rarefied regime. Therefore, for the cases of 1 ≤ Kn ≤ 10 and 0.1 ≤ Kn < 1, we, respectively, use the 10 0 × 10 0 and 50 × 50 non-uniform discrete velocity points [19] at finite range of [ −4 , 4] × [ −4 , 4] to approximate the continuous molecular velocity space, while for the cases of 0.01 ≤ Kn < 0.1 and 10 −4 ≤ Kn < 0 . 01 , the 28 × 28 and 8 × 8 halfrange Gauss-Hermit discrete velocity points are applied, respectively. Note that all the parameters presented in this paper are dimensionless, where the spatial length and molecular velocity are scaled by H and ξ 0 = √ 2 RT w , respectively. The velocity profiles along the channel cross section at Kn = 10 , 1, 0.1, and 10 −3 are plotted in Fig. 1 . The numerical results of the DUGKS with grid points of 10 0 can be regarded as the reference solutions. It is found that the DUGKS can give adequately accurate results with just 10 grid points in all the flows, while for the GDVM, 20 and 10 0 mesh points are respectively required in highly rarefied and near-continuum regimes. For example, when Kn = 10 −3 , the GDVM with 20 mesh points underpredicts the velocity in the channel center by 16%, while that of the DUGKS is only 2% even with a coarser mesh of 10, see Fig. 1 (d).
We then compare the apparent gas permeability κ predicted by the GDVM and DUGKS, which is defined by [48] Fig. 2 shows the permeability at different Knudsen numbers. For highly rarefied flows ( Fig. 2 (a)), the results obtained from the GDVM and DUGKS agree well with each other for the given meshes. However, when the flow approaches the slip and hydrodynamic regimes ( Fig. 2 (b)), in order to obtain accurate results, the GDVM requires the spatial mesh that is about one order of magnitude finer than that of the DUGKS. Note that when Kn = 10 −4 , the permeability obtained from the GDVM with 10 0 mesh points is not presented in Fig. 2 (b), which is time consuming to compute.
The above comparisons demonstrate superiority of the DUGKS over the GDVM in terms of the mesh requirement. However, the computational efficiency is another important issue. To this end, we study the time needed for each iteration, as well as the iteration numbers needed to reach the convergence. The CPU time cost for each iteration is assessed when both codes are executed on the same workstation (Dual Intel Xeon CPU E5-2630 v3 @ 2.4 GHz with 64Gb of RAM memory). It is found that for the case of Kn = 10 with 10 0 mesh points, the DUGKS needs 0.0593 s for each iteration, which is about twice as much as the GDVM. Iteration steps of the GDVM and DUGKS to achieve the steady-state defined by Eq. (33) are also given in Fig. 3 . With the same CFL number η = 0 . 5 , both methods have the similar convergency rate in the highly rarefied regime, while in the near continuum regime, the DUGKS convergents much faster than the GDVM. When using a larger CFL number up to 10 6 , the convergence rate of the implicit GDVM turns to be about two orders of magnitude faster than the explicit DUGKS when Kn > 1, however, although using  such large CFL number, the GDVM is still about one order of magnitude slower than the DUGKS in the hydrodynamic regime, i.e., Kn < 0.001. Moreover, as shown in Figs. 1 and 2 , in order to obtain the same accurate results, for the cases of Kn ≤ 0.1 and Kn ≥ 1, the GDVM needs 10 0 and 20 grid points, respectively, while the DUGKS only requires 10 mesh points in the whole regime. So to produce reasonably accurate results, the DUGKS requires fewer mesh points than the GDVM. As a result, the efficiency of DUGKS can be significantly improved. The total CPU time costs of the implicit GDVM and DUGKS are presented in Table 1 . It is found that the DUGKS is about two orders of magnitude faster than the implicit GDVM in near hydrodynamic regime, while as Kn increases, the implicit GDVM turns out to be about two orders of magnitude faster than the DUGKS in the highly rarefied regime.
It should be emphasized that for the Poiseuille flow in a straight infinite channel, the flow driven by an external force and a pressure gradient are equivalent, which is confirmed by the results of the apparent gas permeability at 0.1 ≤ Kn ≤ 10 as shown in Fig. 4 . It is found that, the mesh independent results for the force-driven and pressure-driven flows obtained from the GDVM and DUGKS are in excellent agreement.

Lid-driven cavity flow
In addition to the force-driven Poiseuille flow, the comparative study between the GDVM and DUGKS is also performed on a 2D lid-driven cavity flow, which is a standard benchmark problem to validate numerical accuracy and efficiency [26,34,36,45] . Here, the Knudsen number is chosen to be Kn = 10 , 1, 0.1, 0.0259, and 6 . 47 × 10 −4 , so that the flows vary from the free molecular to hydrodynamic regimes. For the cases of Kn = 0 . 0259 and 6 . 47 × 10 −4 , the corresponding Reynolds numbers are Re = 10 and 400, respectively. The length and height of the cavity are both set to be 1. The Mach number defined by the velocity of the top-wall U w is 0.16, while the other three walls are stationary. The temperature of all the walls is fixed at T w = 1 , and the diffuse boundary condition [34] is used.
In the simulations, when Kn = 10 and 1, we use 10 0 × 10 0 nonuniform discrete velocity points [19]   points, respectively. Independence of results with respect to the number of discrete velocity point is already validated. The CFL number η in both methods are set to be 0.5 unless otherwise stated. It should be noted that in what follows the "resolved" result means the solution is mesh independent; the velocity and heat flux presented are normalized by U w and p 0 U w , respectively, where p 0 is the initial pressure.
Figs. 5-7 show the velocity and heat flux profiles along the horizontal and vertical centerlines of the cavity when Kn = 10 , 1 and 0.1, respectively. In order to compare accuracy of these two methods, the results on different mesh resolutions are presented, and with a mesh of 64 2 the results are already well-resolved. The results of the full Boltzmann equation solved by the fast spectral method (FSM) are also included for comparison [18,19] . As we can see, the resolved velocity profiles agree well with those from the FSM, however, discrepancies are observed for the heat flux, despite that the resolved results of the GDVM and DUGKS agree well with each other. This can be attributed to that the GDVM and DUGKS are obtained based on the simplified Boltzmann model equation, while the FSM solves the full Boltzmann model. In addition, the heat flux, a high-order moment of the velocity distribution function, is more sensitive to the collision model than low-order ones.
In addition, as shown in Figs. 5 (d), 6 (d) and 7 (d), the GDVM with 32 × 32 grid points underestimates the peak value of the vertical heat flux Qy adjacent to the right wall, while the DUGKS results with the same coarse mesh are in reasonable agreement with those of the fine mesh of 64 2 . For instance, for the case of GDVM at Kn = 10 with the mesh of 32 × 32, the maximum relative error of Qy is about 38.2% compared with the resolved results, while it is about 11.1 % for the DUGKS counterpart. Additionally, it is interesting to note that there is no such clear discrepancy for the horizontal heat flux Qx . This is because the variation of Qy along the horizontal direction is more intensive than that of Qx along the vertical direction. With such coarse mesh in non-smooth region, the thirdorder accurate upwind scheme in which the numerical stencil expands to large distance, may produce large error. The second-order accurate upwind scheme is also tested with coarse mesh of 32 2 and it captures this Qy peak much better than the high order one with the same mesh, but the results of the high-order GDVM is still overall better than those of the second-order one. Fig. 8 shows the velocity and heat flux along centerlines of the cavity for Kn = 0 . 0259 ( Re = 10 ) in the early slip regime. It is usually recognized that it is difficult for the traditional DVM in this regime due to requirement of fine meshes. However, we note that the resolved results of the GDVM and DUGKS with the fine mesh of 64 2 are in excellent agreement with each other, although visible deviations from the FSM results are still observed for the highorder moment, i.e., the heat flux. This indicates that the mesh requirement of the GDVM in such regime is acceptable due to the use of high-order approximation. In addition, with a coarse mesh of 32 2 , the vertical velocity V and the heat flux Qy computed by the DUGKS are slightly better than those from the GDVM. The same conclusion can be drawn for the flows in the transition and free molecular regimes.
The results at Kn = 6 . 47 × 10 −4 ( Re = 400 ) in the hydrodynamic regime are also presented, where the benchmark NS solutions are available [46] . Figs. 9 (a) and (b) show the horizontal and vertical velocity profiles along the centerlines of the cavity. It is found that with the coarse mesh, the results of DUGKS are much better than those of the GDVM. For example, as shown in Fig. 9 (a), with the mesh of 32 2 , the GDVM underestimates the U -velocity boundary layer adjacent to the top wall, whereas the DUGKS can accurately capture this velocity boundary layer with such coarse resolution. This indicates that the GDVM is more dissipative than the DUGKS. In addition, we also observe that the DUGKS is not so sensitive to mesh resolutions as the GDVM. This is because that even in this regime DUGKS still preserves the second-order spatial accuracy [34,35] . Similar observations can be obtained from Figs. 9 (c) and (d). In these two figures, we, respectively, plot the U and V velocity distributions on different mesh resolutions; the well-resolved results of DUGKS with the finest mesh of 128 2 are regarded as the reference solutions. It is observed that the results of GDVM with a mesh of 64 2 clearly deviate from the reference solutions, particularly around the cavity corners and vortex centers, while the DUGKS with the same mesh can adequately resolve the flow field. This is consistent with the analysis in Section 2.4 that the DUGKS is more accurate than the GDVM in the continuum regime. Fig. 10 gives the grid independent results of the U -velocity along the vertical centerline and the V -velocity along the horizontal centerline, obtained from the GDVM and DUGKS simulations. The results are validated by the DSMC data [45] for rarefied flow (Kn = 10 , 1 , 0 . 1) and the high resolution NS data [46] for continuum flow (Re = 400) . It is clearly found that, with sufficient grids, the results of GDVM and DUGKS are in excellent agreement with the benchmark solutions, and the number of grid points required by the DUGKS is only half of that for the GDVM.
It is usually recognized that, for highly rarefied flows, with few collisions between gas molecules, the mean free path is larger than the flow characteristic length, while in the near hydrodynamic regime, frequent molecular collisions occur, so the flow characteristic length is much larger than the mean free path. Therefore, the mesh size could be smaller than the mean free path in the highly rarefied regime, and vice versa in the near hydrodynamic regimes.
Since the mean free path λ = KnH, and the mesh size where H is the characteristic length, and N is the grid points in one direction, so the ratio of gas mean free path and the mesh size is δ = KnN. Therefore, for a given Kn , larger δ suggests that the method requires finer mesh to resolve the flow field. Table 2 gives the value of δ at different Knudsen numbers. It is clearly found  [45] for rarefied flow and the high resolution NS data given by Ghia et al. [46] for continuum flow are also included for comparison.

Table 2
The ratio of the gas mean free path λ and the resolved grid size x which gives mesh independent results, denoted by δ = λ/ x , at different Knudsen numbers. that the relations between resolved mesh size and gas mean free path in different flow regimes are consistent with the above analysis. Also, it is noted that the DUGKS indeed requires fewer grid points to resolve the given flow field. Distinct algorithm design of the GDVM and DUGKS may lead to different convergent processes. Fig. 11 depicts evolution of the relative global error defined by Eq. (33) at different Knudsen numbers. In addition to the results with the same CFL number, the results of GDVM with a large CFL number up to η = 10 4 , say implicit GDVM, are also included. As we can see from Figs. 11 (a) and (b), error evolution of both methods in the transition and free molecular regimes are almost identical to each other. However, when approaching to the slip and hydrodynamic regimes, as shown in Figs. 11 (c) and (d), the convergence rate of DUGKS is apparently faster than that of the GDVM. Furthermore, we also note that the implicit GDVM converges about two orders of magnitude faster than the GDVM and DUGKS in highly rarefied regime, while in the continuum region, as shown in Fig. 11 (d), the convergence rate of DUGKS turns to be two times faster than that of GDVM as well as the implicit GDVM.
In addition to accuracy, the computational efficiencies of GDVM and DUGKS are also measured. Firstly, we compare the CPU time cost of each iteration. For a fair comparison, the time step is set to be identical in the GDVM and DUGKS. For the case of Kn = 0 . 1 with 64 2 mesh points, the CPU time costs within one time step are 0.1283 s and 0.2965 s for the GDVM and DUGKS simulations, respectively, which indicates that the GDVM is about one time faster than the DUGKS for each iteration. According to our analysis in Section 2.4 , this result is not surprising as the DUGKS with a finite-volume formulation computes more equilibrium state functions than the finite-difference GDVM.
However, as shown in Fig. 11 , there are different convergence rates for the GDVM and DUGKS in various regimes, which lead to different time costs to achieve a converged solution. This assessment includes not only the time cost of these two methods with a same time step, but also the implicit GDVM. Table 3 presents the total CPU time costs to attain the steady-state ( Eq. (33) ) solutions in the various regimes. Note that for the implicit GDVM, the error estimation is performed for each iteration. As expected, in the transition and free molecular regimes, the GDVM is about one time faster than the DUGKS, whereas as Kn decreases, the GDVM becomes slower than the DUGKS due to the faster convergence rate Note that for the implicit GDVM, the error is estimated at one time step.

Table 3
The total CPU time costs (in minute) of the GDVM and DUGKS when the results satisfy the stead-state criterion given by Eq. (33) on the mesh of 64 2 . The results of the implicit GDVM are also included. Note that the convergency criteria for the implicit GDVM is measured at one time step. of DUGKS. Moreover, it is interesting to note that although the efficiency of implicit GDVM is improved by two orders of magnitude in highly rarefied regime, it is still about one time slower than the DUGKS in the hydrodynamic regime.
It should be noted that the above efficiency comparisons are based on the same mesh for the two methods. As shown, the GDVM requires 64 2 mesh points to obtain the resolved results for the flows from early slip to highly rarefied regime, while for the DUGKS, it only needs a coarser mesh of 32 2 . Likewise, for the continuum flow, the mesh requirements for the GDVM and DUGKS are 128 2 and 64 2 , respectively. Therefore, the DUGKS can achieve accurate results with coarser meshes in comparison with the GDVM. Consequently, as shown in Table 4 , to achieve the well-resolved results, the DUGKS is about one order of magnitude faster than Table 4 The total CPU time costs (in minute) of the implicit GDVM and DUGKS when the results are well resolved. The convergency criteria for the implicit GDVM is measured at one time step. the implicit GDVM in the continuum region, and vice versa in the highly rarefied regime. We must emphasize that although the uniform mesh is used in the above simulations, non-uniform meshes can be easily implemented in the finite-difference GDVM and the finite-volume DUGKS [36] . In addition, the unstructured meshes have already been used for the DUGKS simulations [40] . It has been demonstrated that with nonuniform meshes, the efficiency of DUGKS can be improved dramatically to allow large simulations [36,47] . In addition, regarding the parallel computation, it is straightforward for both the GDVM and DUGKS to decompose molecular velocity space. However, if the physical domain decomposition is of interest, the parallel implementation of this implicit GDVM is considerably more challenging than the DUGKS counterpart.

Conclusions
The main objective of this work is to quantify the computational performance of different DVMs, so that researchers may choose the most appropriate method for their applications. Our results show that both the GDVM and DUGKS can accurately reproduce the results in all the flow regimes, provided that the mesh resolution is sufficient. Meanwhile, it is found that the DUGKS is less dissipative and consequently requires a much smaller number of grid points than the GDVM, especially in the continuum and near-continuum regimes. For the GDVM, the convection term of the kinetic model is approximated by the upwind scheme with the underlying assumption of molecular free streaming between two grid points, while in the DUGKS the collision and transport processes are coupled physically by using the discrete characteristic solution of the kinetic equation. Therefore, even with a third-order discretization, the GDVM is not as accurate as the second-order DUGKS, particularly in near hydrodynamic regimes.
The efficiency and convergence rate of the GDVM and DUGKS are also compared. Our results show that with the same mesh for each iteration, the CPU time cost of the DUGKS is about twice that of the GDVM, which is not surprising that the finite-volume DUGKS computes more equilibrium state distribution functions when compared to the GDVM with a finite-difference formulation. In addition, when using the same time step and spatial mesh, the GDVM and DUGKS show similar convergence rates for the flows ranging from the free molecular to early slip, so that the GDVM is about twice as fast as the DUGKS; when using a large time step, the implicit GDVM is faster than the explicit DUGKS by about two orders of magnitude. However, as the flow approaches to the hydrodynamic regime in which molecular collisions dominate, the DUGKS converges faster, consequently, it turns out to be twice as fast as the implicit GDVM. It should be noted that in order to achieve results in reasonable accuracy, the DUGKS requires fewer mesh points than the GDVM, therefore, the overall computational efficiency of DUGKS can be improved by one order of magnitude.
In summary, the DUGKS is preferable for flow problems involving different flow regimes, while if only the steady-state solution of highly rarefied flows is of interest, the implicit GDVM, which can boost the GDVM convergence rate by two orders of magnitude, is a better choice.