An immersed method based on cut-cells for the simulation of 2D incompressible fluid flows past solid structures

We present a cut-cell method for the simulation of 2D incompressible flows past obstacles. It consists in using the MAC scheme on cartesian grids and imposing Dirchlet boundary conditions for the velocity field on the boundary of solid structures following the Shortley-Weller formulation. In order to ensure local conservation properties, viscous and convecting terms are discretized in a finite volume way. The scheme is second order implicit in time for the linear part, the linear systems are solved by the use of the capacitance matrix method for non-moving obstacles. Numerical results of flows around an impulsively started circular cylinder are presented which confirm the efficiency of the method, for Reynolds numbers 1000 and 3000. An example of flows around a moving rigid body at Reynolds number 800 is also shown, a solver using the PETSc-Library has been prefered in this context to solve the linear systems.


Introduction
For some decades, many researchers and engineers have been considering the numerical solution of fluid flows, for different kind of fluids and different geometries. With the increasing performance of super-computers, it has been possible to tackle more and more challenging problems, for higher Reynolds numbers and complex geometries. Several different discretization techniques can be used to consider these problems: Finite Element Methods, Finite Volumes Methods, Spectral Methods, and Finite Difference Methods. The MAC scheme on cartesian grids [15] can be viewed both as a Finite Volume Method or a Finite Difference Method on staggered grids, and is adapted to 2D or 3D flows in simple geometries, for example for lid-driven cavity or backward facing step. To take into account some obstacles in the flows (or complex geometries), Figure 1. The solid body Ω S with boundary Γ S and the surrounding computational domain Ω F in which the flow is to be simulated.
2. The Setting of the Problem 2.1. Flows past obstacles. We consider a flow in a two-dimensional domain Ω = (0, L)×(0, H) which contains a domain Ω S occupied by the solid which is supposed to be fixed for the sake of simplicity. We denote then Ω F = Ω \ Ω S the domain occupied by the fluid (see Fig. 1).
The velocity field in the fluid satisfies the Navier-Stokes equations, with no-slip boundary conditions. We consider then the problem: ∇ · u = 0, (2) u(x, t = 0) = u 0 , where u(x, t) = (u, v) is the velocity field at x = (x, y) ∈ Ω F at time t > 0, u 0 is the initial condition and Re is the Reynolds number. We impose homogeneous Dirichlet boundary conditions for the velocity field on ∂Ω F : We mention that non-homogenous Dirichlet boundary conditions can also be treated with the method presented here.

2.2.
Discretization. For the time-discretization of (1)-(3), we use a second-order backward difference (BDF2) projection scheme. In a first step, the velocity field is advanced in time with a semi-implicit scheme decoupling the velocity and pressure unknowns. Then, the intermediate velocity is projected in order to obtain a free-divergence velocity field.
Let δt > 0 stand for the time step and t k = k δt discrete time values. Let us consider that (u j , P j ) are known for j ≤ k. The computation of (u k+1 , P k+1 ) needs two steps: with homogeneous Dirichlet boundary condition for u k+1 . Then the intermediate velocity field u k+1 is projected in the free-divergence space to get u k+1 : For the spatial discretization, we modify the MAC scheme near the boundary by changing the location of the unknowns of the velocity components for the cells cut by the solid as depicted on Fig. 2 , the pressure unknowns remaining in their original place (see [7] for more details). To discretize the Laplacian in (5), we must replace the five-point formula by a six-point discretization. For the convective terms, the fluxes are computed at the midle of the vertical and horizontal edges (see Fig. 3). For the pressure, linear interpolation are used rather than chang-  ing the location of the pressure unknowns. The same kind of linear interpolation is used to get consistant evaluation of the pressure gradient in (6).
Although the truncation error is only first order in space for the resulting numerical scheme, the second order accuracy is recovered which is due to a superconvergence phenomena analoguous to those proven in [20]. This second order has been observed in [16] by showing results in comparison with those of [13].

3.1.
A fast parallel direct solver to treat fixed solid strutures. When considering fixed solid bodies, the use of a direct solver with a preprocessing procedure is efficient. Once the preprocessing computations have been done, the cost paid to solve the linear systems is about twice the case of a numerical simulation in the same computational domain without obstacles. We summarize hereafter the fast direct solver derived from the capacitance matrix method and adapted to the case of non uniform grids (see [7] and [8] for the details) which has been implemented in our code.
After spatial discretization of the Navier-Stokes equations, one linear equation is obtained per node in the part of the computational domain filled by the fluid and per unknown that is u, v and p. We complete these sets of linear equations by adding similar ones for nodes of the cartesian grid lying inside the solid obstacle but with zero as right-hand side. The unknowns corresponding to mesh points in Ω S h are fictitious ones. As in Ω F , the numerical scheme accounts for the boundary conditions on Γ S h , the fluid unknowns are independant to the solid ones. We therefore obtain linear algebraic systems defined on the whole cartesian grid with n x × n y mesh points whose sizes are (n x − 1) × n y for u, n x × (n y − 1) for v and (n x − 1) × (n y − 1) for p. All three linear systems are similar in nature : the resulting matrices have similar structures with five or six non-zero coefficients per row.
Let us consider one of these linear system. We denote by N its size and by A ∈ M N (R) its matrix. Then at each time iteration, we have to solve a linear system (7) AX = Z with the right-hand side Z computed from the velocity and the pressure at previous time steps.
As it is mentioned above, the matrix A is non-symmetric. Let us consider now the matrix G obtained with the same discretization on the whole computational domain Ω totally filled by a fluid that is with no obstacles. The matrices A and G differ only on rows corresponding to computational meshes for which the five-point stencil interacts with a cut-cell. Let us denote by n c this total number of rows, namely rows such that A − G have non-vanishing coefficients. The efficiency of our direct solver is due to the fact that n c is small compared with N and that the non-zero coefficients on each row of A − G is bounded. The linear system (7) can be rewritten as where Q is a matrix of dimensions N × n c with one non-vanishing coefficient per column, equal to one, and Y ∈ R nc such that Q Y = (A − G) X. It can be easily shown, using Q t Q = I nc , that Y is solution of the following linear system Based on these relations, the algorithm implemented to solve (7) consists in a preprocessing step where the matrix I nc + M G −1 Q is factorized (we use a LU -factorization) followed by i) Compute Z and solve GW = Z ; ii) Compute M W and solve (9) ; iii) Compute QY and solve GX = Z − QY . Recalling that G is the matrix corresponding to the standard MAC scheme on the whole computational mesh, steps i) and iii) can be performed by using any efficient solvers available on cartesian grids. In the present work, we use Discrete Fourier transforms in the vertical direction (where the mesh is uniform) combined with LU -factorizations of the resulting tridiagonal systems.
The parallel version of this direct solver is based on explicit communications performed by calling functions of the MPI library. The main feature of MPI is that a parallel application consists in running p independant processes which may be executed on different computers, processors or cores. These processes can exchange datas by sending/receiving messages via a network connecting all the involved computing units.
The first step when developping a parallel algorithm is to define a suitable and efficient splitting of the datas among the MPI processes : each MPI process will treat datas associated with a part of the total computational mesh. For our problem, this choice is straightforward and is related to the algorithm used to solve the linear systems. Indeed, it is much easier to implement a parallel resolution of tridiagonal linear systems rather than a parallel version of the DFT. Therefore, the parallel version of the code is based on a splitting of the datas along the horizontal axis, so that each MPI process works with a vertical slice of the computational mesh as it is illustrated on Figure 4.
In the framework of finite volume or finite difference schemes on cartesian grids, the explicit computation of spatial derivatives is local and involves very few communications. The only tricky part concerns the resolution of the linear systems. The step ii) of the direct solver described in the previous section consists in solving a linear system involving the matrix I nc + M G −1 Q. As the LU -factorization of this matrix has been computed and stored in a pre-processing step at the beginning of the time iterations, we have to solve two triangular systems which can not be efficiently performed on parallel computers. As n c is small compared to the size of the global problem, we choose to dedicate this task to one given MPI process (fixed in advance) per unknown, that is u, v and p. Once these linear systems are solved, the resulting vectors are scattered from these MPI processes to the other ones.
As it was previously mentioned, linear systems of steps i) and ii) are solved by first applying a DFT in the y-direction : these computations are independant and can be performed without any communications due to the distribution of datas among the MPI processes. This results in a collection, one per grid point in the y-direction, of independant tridiagonal linear systems connecting all nodes of the mesh in the x-direction. A parallel direct solver based on the divide and conquer approach (DAC) for tridiagonal matrices has been implemented (see [6]). The DAC method, applied to solve one tridiagonal linear system on n p > 1 MPI processes, consists in splitting the tridiagonal matrix into n p independant blocks (one per MPI process). The solutions of these systems have to be corrected in order to recover the solution of the global system. These corrections correspond to 2n p −1 values which are solutions of a tridiagonal linear system of size 2n p − 1. This phase of the DAC method is sequential and has to be performed on one process inducing a useless waiting time for the other processes. However, as we have to solve n y such systems simultaneously, this sequential part can be distributed among all the n p processes. In this context, the DAC algorithm leads to an efficient parallel code.
The parallel code has a good level of performance : less than 15% of the CPU time is spent in communications between MPI processes. The sequential part performed on one process represents a negligible amount of CPU time. The computations presented here have been performed on a DELL cluster using up to 32 cores of Xeon processors. A low latency bandwith network connects the cluster nodes.

3.2.
Iterative solver for the case of moving bodies. For solid bodies moving in a computational domain filled by a fluid, as the case considered in Section 4.2, cut-cells may change at each time iteration. Therefore, the coefficients of the matrices of the linear systems for the velocity components, issued from the discretizetion of the momentum equations, and for the pressure increment computed in the projection step of the time scheme, have to be recomputed at each time step. In that context, the use of the fast direct solver described in the previous section is cumbersome and inefficient except on coarse meshes. In order to be able to treat such configurations, we have implemented a PETSc version [2,3] of our cut-cell scheme. The main advantage of the PETSc Library is that, in a parallel programming environment based on MPI, many iterative solvers combined with different preconditionners can be used. The choice can be made at run time.

4.1.
Flow past a circular cylinder at Re = 1000 and 3000. In this section, we present numerical simulations, performed with the parallel direct solver method described in Section 3.1. We consider the case of flows past a fixed circular cylinder of diameter D. The Reynolds number is defined based on the diameter D of the cylinder, i.e. Re = U ∞ D/ν where U ∞ is the horizontal free stream velocity. As non-dimensional time, we consider T = 2U ∞ t/D.
A circular cylinder of diameter equals to unity is centered at the origin of the computational domain Ω = (−L x , L x ) × (−L y , L y ). As boundary conditions, a uniform velocity profile u(x = −L x , t) = (1, 0) is imposed at the inflow and a convective boundary condition is applied at the exit, namely the convective equation (10) ∂u ∂t + (1, 0) · ∇u = 0 is solved at x = L x . On the top and bottom boundaries, that is y = ±L y , slip boundary conditions are used that is ∂u ∂n = 0 and v = 0. Finally, no-slip (u| Γ S = 0) boundary condition is applied on the surface of the obstacle.
For this problem important quantities reflecting the dynamics of the vortices formed in the vicinity of the solid boundary and developping at the rear of the cylinder are the pressure drag and lift coefficients. They are derived from the total drag force on the body, which is computed as The pressure drag and lift coefficients C p and C are given by C p = 2F b · e x and C = 2F b · e y and the (total) drag coefficient is C d = C p + C . Starting with a flow at rest, the drag coefficient behaves as T −1/2 in the early stage of the development of the vortices. This squareroot singularity has been theoretically predicted by Bar-Lev and Yang in [4]. They have derived the following expression for the (total) drag coefficient As whown in Figure 5, the values obtained with the cut-cell scheme on a grid with 4096 × 8192 mesh points discretizing the domain Ω = (−10, 10) 2 perfectly match the theoretical curve drawing (12) on the time interval T ∈ [0, 0.2] for Re = 1000. Our cut-cell method captures the square-root singularity of the drag coefficient. This simulation has been run using 16 MPI processes. On longer time interval, namely T ∈ [0, 5], the results are in good agreement with those obtained by Koumoutsakos and Leonard in [18] with a vortex method. In order to test the grid convergence of these results, the same simulation has been conducted on a grid with two times more points in both spatial directions, that is 8192 × 16384 mesh points, in the same computational domain. In that case, 32 MPI processes have been used. Both results are almost indistinguishable on Figure 6 indicating that the coarser resolution is enough to capture the essential features of the flow at this Reynolds number. The development of the flow around the impulsively started cylinder at Re = 1000 can be seen on Figure 7. In the early stage T ≤ 1, a primary vortex develops in the vinicity of the boundary at the rear of the cylinder. Then for T ∈ [1, 2] a secondary vortex appears trying to move insight the primary vortex and to push it away from the solid boundary (T ≥ 4). A tertiary vortex is visible at T = 3 which remains sticked to the boundary constrained by the two other vortices having more strength. These results compare well with the same flow representations shown in [18]. As expected on short time interval (T ≤ 5) the flow remains symmetric. At Re = 3000, the time evolution of the drag coefficient plotted on Figure 8 exhibits also the square-root singularity on short time interval and is in good agreement with the results of Koumoutsakos and Leonard. As expected, the drag coefficients remains almost constant for T ∈ [2, 3] (see [18]). Note that on this time interval, a small difference exists between the results computed on the two different grids. However, the coarser simulation is fine enough to capture the flow dynamics at Re = 3000. The mesh size of the coarser grid, which is constant , is an open question. This will be addressed in further works. At this Reynolds number, a scenario similar than that at Re = 1000 can be observed on Figure 9 with the development of three vortices in the early stage of the flow dynamics. The secondary vortex penetrates further inside the primary vortex aera and the tertiary vortex has more strength as it could be expected with less effects of the viscous forces at Re = 3000. Again, an overall good agreement is found with the vortcity contours shown in [18] at the same Reynolds number and time T . As previously mentioned, the flow remains symmetric at the beginning of the simulations for these flows around an impulsively started cylinder. By carrying the time integration over a much longer time interval T ∈ [0, 200] instabilities due to round-off errors and to the nonlinearity of the system develop so that the flow becomes non symmetric for T ≥ 100 at Re = 1000 and T ≥ 50 at Re = 3000 as it can be seen on Figures 10 and 11 representing the time history of the drag coefficient. After a transient period, an increase of the drag coeffient is observed which stabilizes and oscillates around a mean value.

4.2.
Flow around moving bodies. The purpose of this section is to show that the present numerical method is also able to simulate incompressible flows around moving bodies. Let us consider a cylinder which starts to move impulsively at t = 0 with the sinusoidal translational motion (13) u body (t) = 2 sin t 2 ; 0 in a fluid initially at rest for Reynolds number 800. We suppose that the fluid is confined within a rectangular computational domain Ω = [−3; 3]×[−1; 1] with no-slip boundary condition on ∂Ω F . The diameter of the cylinder is equal to 1 and it is initially centered at the origin. The boundary condition (13) at the body surface ∂Ω S is enforced through the non exhaustive following right hand side terms which vanish in the case of fixed obstacle : u(κ u,S i,j ), u(ξ S i,j−1/2 , y j−1/2 ), u body (κ S i,j ) and so on. This terms are respectively taking into account in the convective terms, Laplacian operator and continuity equation. More details can be found in [7].
As the obstacle moves from one time step to another, we have to update matrices of the linear systems corresponding to Poisson and momentum equations at each iteration. Therefore, in such configuration, the direct solver requires much more CPU time compared to some iterative solver, which does not require a preprocessing step. For large problems, the faster solver we have found is an algebraic multigrid method (HYPRE BoomerAMG) implemented using the PETSc library [2,3].
A constant mesh size h = 5 × 10 −3 is used in both directions and the value of the time step, satisfying a CFL stability condition, is 10 −3 . As shown in Figure 12, vortices interact with each other and also with the boundaries. The flow remains perfectly symmetric until t = 6π, thereafter the symmetry of the flow is lost due to rounding errors inherent in computer calculations (see Figure 13).

Conclusion
We have presented a cut-cell method for the numerical solution of flows past obstacles. We have detailed the numerical method and the computational aspects for fixed obstacles, and shown numerical results for fixed and moving rigid bodies. The parallel version of the algorithm presented here allows computations of flows at Reynolds number up to 3000. The numerical tests confirm some results of the literature, a good agreement is observed with the numerical t = 0 t = π t = 2π t = 3π simulations in [18]. We have also shown that the computation of the drag coefficient matches the theoretical square-root singularity predicted by [4]. The choice of the size of the box (compared with the grid size h) is one of the questions that we would like to investigate with this method in further works, we also would like to deal with rigid bodies following the fluid flow.