A preconditioned lattice Boltzmann flux solver for steady flows on unstructured hexahedral grids

Abstract The lattice Boltzmann flux solver (LBFS), first introduced by Shu et al. (2014) on structured meshes, allows fluid flow problems to be solved on unstructured meshes discretised by the finite volume method. The solver calculates the macroscopic fluxes at the cell interfaces from a local reconstruction of the lattice Boltzmann solution. In this paper the LBFS is extended to three-dimensional unstructured hexahedral meshes and a preconditioned lattice Boltzmann flux solver (PLBFS) is presented. The PLBFS involves applying the preconditioning technique proposed by Guo (2004) to the LBFS and is achieved by modifying the equilibrium distribution function used to calculate the macroscopic fluxes at the cell interface. When the PLBFS is applied to steady flow problems, it is shown that convergence is significantly accelerated and the accuracy of predictions with unstructured grids is greatly improved when compared to the LBFS. This paper also introduces a strategy for choosing the optimal value of preconditioning factor with unstructured hexahedral meshes.


Introduction
In recent years the lattice Boltzmann method (LBM) has become a well established alternative for solving computational fluid dynamics (CFD) problems. It involves calculating the change in the density distribution of discrete particles at the mesoscopic level. The change in density distribution is due to the collision of the discrete particles and the subsequent streaming of the particles. Using a Chapman-Enskog expansion, it can be shown that the lattice Boltzmann equation (LBE) applied at a mesoscopic level is equivalent to the Navier-Stokes (NS) equations at a macroscopic level [1] . The LBM has many advantages compared with traditional NS equations solvers when solving transient problems as it does not involve the solution of the expensive Poisson equation for pressure and uses a highly algebraic and parrellisable approach to calculating densities and velocities in a flow problem. Limitations of the standard LBM include the necessary use of a uniform Cartesian grid and the restriction of its use to flow problems with low Mach numbers. The required use of a uniform Cartesian grid is a major restriction when it comes to the modelling of real life engineering problems such as the haemodynamics of biomedical de-vices and the aerodynamics of cars and aeroplanes. Issues encountered include approximating curved boundaries using a staircase approximation and domain wide refinement of the grid to resolve boundary layer fluid dynamics. Using a staircase approximation introduces a geometrical discretisation error while domain wide refinement of the grid leads to excessive computational effort. This has resulted in much research into fully unstructured LBMs which would enable the use of body fitted grids with complex geometries and have local refinement near boundaries.
The first attempt at an unstructured LBM was by Peng [2,3] and Xi [4] . This approach involved the integration of the differential form of the LBE in control volumes around the grid points. However it was found that the method suffered from significant instability issues. Over the years improvements to the stability of finite volume -lattice Boltzmann methods (FVLBM) have been made. Stiebler [5] introduced a least squares, linear reconstruction based upwind discretization scheme for the FVLBM. A total variation diminishing approach to the FVLBM was introduced by Patil [6] whereas Ubertini et al. [7] used a memory term to increase stability thresholds of the method. More recently Zarghami et al. [8] used upwind second order pressure biasing factors as flux correctors to improve stability.
An alternative innovative approach was proposed by Wang et al. [9] and Shu et al. [10] when they developed the lattice Boltzmann flux solver (LBFS). This involves discretising the domain into cells and calculating the macroscopic fluxes at the cell interface using https://doi.org/10.1016/j.compfluid.2020.104634 0045-7930/© 2020 Elsevier Ltd. All rights reserved. a local reconstruction of the LBE. Unlike in the FVLBMs discussed above, the LBFS involves solving conservation equations for the macroscopic variables whereas the FVLBM involves solving conservation equations of the particle distributions. While the original method was implemented on non-uniform orthogonal meshes, lately it has been applied to fully unstructured 2D tetrahedral meshes by Pellerin et al. [11] , Wu et al. [12] and more recently Liu et al. [13] . The latter introduces a high order least-squares approach that improves the order of accuracy of the LBFS from 2nd order to 4th order. The LBFS also been extended to model fluidstructure interactions on 3D geometries through the use of the immersed-boundary method by Wang et al. [14] .
The LBFS was adopted in this work due to its advantages over the FVLBM with regards to stability and the implementation of boundary conditions. The LBFS does not require the use of pressure biasing factors or other such schemes to achieve stability at higher Reynolds numbers. The LBFS also allows the direct implementation of physical boundary conditions whereas the FVLBM relies on the standard LBM family of boundary conditions which are more difficult to implement.
The LBFS shares many traits with the family of artificial compressibility methods (ACM) which was initially introduced by Chorin [15] . In the ACM, an artificial relationship is introduced between pressure and density variables. This changes the nature of the governing equations from mixed elliptic/parabolic nature into the hyperbolic/parabolic. This enables the use of efficient timestepping schemes in calculating the steady state solution of incompressible flows. The LBFS differs from the ACM, in that the relationship between pressure and density is set by an arbitrary parameter in the ACM [15] . More recently effort s by Turkel [16] and Malan [17] have provided a means of calculating the optimum value of this arbitrary parameter. In the LBFS the relationship between pressure and density is defined by the lattice discretisation and there is no need to optimise the arbitrary parameter required in the ACM. The ACM and LBFS both solve the NS equations, with the ACM using either a finite difference [18] or finite volume [17] spatial discretisation. The LBFS uses a finite volume approach. The ACM solves the NS equations but uses a perturbed continuity equation and, as described by He et al [19] , this has no physical meaning in incompressible flow. In contrast the LBFS is a âǣweaklyâǥ compressible method and its continuity equation has a physical meaning. There is also a difference in the stencils used in the calculation of the fluxes; the original ACM proposed a central differencing or leapfrog scheme [15] and more recently 3rd order upwind and 4th order central compact schemes have been used [20] . In comparison the LBFS uses a local reconstruction of the LBE to calculate the fluxes. Further differences arise in that the ACM requires the incorporation of artificial dissipation whereas the LBFS does not [18] [17] . According to Kozela [21] , the ACM is shown to require dual-stepping or similar implicit methods to effect real time accuracy for unsteady flows whereas the LBFS can effect real time accuracy using explicit and implicit methods. Finally, while the original ACM can be thought of as a preconditioning approach to the continuity equation, this has also been extended to introduce preconditioning to the momentum equations [22] . This has the impact of improving the convergence and robustness of the ACM approach.
A key characteristic of the standard LBM is the grid independent "compressibility" error which is directly proportional to the Mach number [23] . As a result LBM simulations limit this error by keeping the Mach number small, typically this involves keeping the Mach number less than 0.4 [24] . Reducing the Mach number has the knock on effect of increasing the disparity between the acoustic and convective wave speeds [25] . As the standard LBM typically employs explicit time-stepping, the Courant-Friedrichs-Levy (CFL) condition should be satisfied to ensure stability. At low Mach numbers this requires a time step inversely proportional to the largest eigenvalue in the system which is approximately the speed of sound. However the convective wave propagates information through the domain at the much lower fluid speed. As a result a large amount of time steps are required to reach steady-state convergence.
In recent years many researchers have made attempts to accelerate convergence of the standard LBM to steady-state convergence. A time independent formulation of the LBM has been proposed by Bernaschi et al. [26] . The steady-state solution is solved through iterative methods and this approach has also been implemented by Verberg [27] and Noble [28] . Tolke et al. [29] expanded on the time independent approach by using a multigrid approach to solve an implicit second-order finite difference scheme. An alternative approach for accelerating convergence of the LBM to steadystate is using implicit schemes to discretise the time-dependent equation. This allows larger time steps to be used and has been implemented by Lee [30] , Seta [31] and Tolke et al. [32] . Further to this Mavriplis [33] proposed a non-linear form of multigrid solver with a non-linear LBE time-stepping scheme. These methods all accelerate convergence of the LBM to steady-state but at the cost of increased complexity compared to the standard LBM.
Preconditioning is another time-dependent steady-state acceleration technique which was successfully applied to the LBM by Guo [34] originally. This approach follows the same principal as the preconditioning method developed by Turkel [25] to solve the incompressible and low speed compressible NS equations. As mentioned above, at low Mach numbers there is a disparity between the acoustic and convective wave speeds. The ratio between these wave speeds is known as the condition number. The lower the condition number, the faster that a solution will converge to the steady-state solution. Preconditioning involves altering the eigenvalues of the NS equations to reduce the condition number. Guo implemented preconditioning in the LBM by applying a single preconditioning factor ( γ -preconditioner) to the equilibrium distribution function. For steady flows this results in an equivalent form of the NS equations with a reduced condition number, which reduces the number of iterations required to reach steady-state.
There have been many additions to Guo's original work. Premnath et al. [35] extended the preconditioning approach to allow for forcing terms in force-driven fluid flow problems. Izquierdo et al. [36] extended it to the generalised form of the LBM including the multiple-relaxation time LBM. In this work a second preconditioning factor ( β-preconditioner) was also used to improve the efficacy of preconditioning. They also investigated optimal values of the preconditioning values for the LBM and gave apriori guidelines for such values [37] . More recently the approach has been extended to a noncascaded central moments LBM [38] , a cascaded LBM [39] and a Gailliean invariant cascaded LBM [40] . Meng et al. have also used an improved preconditioned multiple-relaxation time LBM to model flow through porous media [41] .
In this work the γ -preconditioning approach of Guo's work is adopted as it has similar performance to the β-preconditioning approach of Izquierdo's work whilst being more efficient and also simpler to implement. It is applied to a LBFS on unstructured hexahedral grids and its performance is evaluated on two fluid problems: 3D lid-driven cavity flow and 3D flow over a stationary cylinder. The structure of the remainder of the paper is as follows: Section 2 introduces the LBFS and the preconditioned lattice Boltzmann flux solver (PLBFS), Section 3 discusses the use of preconditioning with unstructured grids and proposes a strategy for the optimal choice of preconditioning parameter, in Section 4 the results of numerical simulations are discussed, and a summary and conclusions are presented in Section 5 .

Navier-stokes equations
If body forces are neglected, the 3D unsteady NS equations can be written in conservative form for a finite control volume in a Cartesian coordinate system as: where t is time, V is the cell volume, S is the cell surface, n is the outward normal to element of area dS , Q is the vector of conserved variables [ ρ, ρu, ρv, ρw ], ρ is the density, and u, v and w are the components of the velocity vector V in the x, y and z directions respectively. Assuming a Newtonian, isotropic, isothermal fluid allows the flux tensor F to be defined as follows: where p is the static pressure and μ is the dynamic viscosity.
The above equations can be used to simulate incompressible flow, when the Mach number is low and density variation is small.
Using a cell-centred finite volume approach for spatial discretisation, Eq. (1) is applied to every cell in the computational domain yielding a set of semi-discrete equations: where for cell i , Q i is the vector of conserved variables, V i is the cell volume, N faces is the number of cell faces, n i,n is the outward normal of face n, S i,n is the area of face n on cell, and F i,n is the flux tensor at the centroid of face n .

Lattice boltzmann flux solver
In the LBFS the flux tensor at each cell interface is evaluated from a local reconstruction of the lattice Boltzmann solution (LBS) at the cell interface. The cell interface is defined as the centroid of the shared face between the two adjoining cells. In this work the single relaxation time Bhatnagar-Gross-Krook (BGK) collision model is employed. This leads to the following formulation for the flux tensor F : e α,y e α,z e α,x e α,x e α,y e α,x e α,z e α,x e α,x e α,y e α,y e α,y e α,z e α,y e α,x e α,z e α,y e α,z e α,z e α,z x e α,x e α,y e α,x e α,z e α,x e α,x e α,y e α,y e α,y e α,z e α,y e α,x e α,z e α,y e α,z e α,z e α,z ⎤ ⎥ ⎥ ⎦ (4) where e α,x , e α,y and e α,z are the x, y and z components respectively of the particle velocity vector e α in the α direction, τ s is the standard relaxation factor, f α is the density distribution function in the α direction, f eq α is the equilibrium density distribution function in the α direction, f neq α is the non-equilibrium density distribution function in the α direction and is equal to f α − f eq α , and N is the number of velocities in the lattice model.
To implement the LBS at the cell interface, one must first choose a lattice velocity set to define e α . Velocity sets are usually denoted in D d Q q form where d denotes the number of spatial dimensions covered by the velocity set and q is the number of velocities in the set. In this work the D3Q15 velocity set is chosen due to its computational efficiency and is shown in Table 1 . i, j and k are the unit vectors in the x, y and z directions respectively, c = δx/δt, δx is the lattice spacing along each Cartesian axis and δt is the streaming time step. Typically δx is set equal to δt giving c equal to 1 and this approach is adopted in this work. In physical terms c is considered to be the lattice velocity. When the D3Q15 lattice velocity set is applied to the cell interface, the lattice is formed by 15 nodes. The Cartesian coordinates of these lattice nodes can be represented in terms of lattice velocities. A 3D illustration of the D3Q15 lattice velocity set implemented at a cell interface is shown in Fig. 1 . In this figure the Cartesian coordinates of the two cell centres are referred to as r i and r i +1 respectively. The cell interface is referred to as r .
The LBE with the BGK collision model [42] can be applied to model a Newtonian fluid at the cell interface. This enables the finding of the equilibrium and non-equilibrium density distribution functions required to calculate the flux tensor. The LBE is given by: where the standard relaxation factor τ s is related to the viscosity and controls the influence of the viscous fluxes on the momentum equation. In physical terms it is the rate at which f α tends to the equilibrium density distribution function f eq α and is given by: where ν is the kinematic viscosity and, as per the literature, c s is the speed of sound in the lattice grid equalling c/ √ 3 for a D3Q15 lattice model. The equilibrium density distribution function f eq α is given by a Hermite series expansion of the Maxwell-Boltzmann distribution: where the weights w α are given as: As shown by Shu et al. [14] , by using the Chapman-Enskog expansion and the linear Taylor series expansion of f eq α , the nonequilibrium density distribution function f neq α can be calculated as follows: In the localised reconstruction of the LBE at the cell interface, the flux tensor is now dependent on the post-streaming and prestreaming equilibrium density distribution functions f eq α ( r , t ) and f eq α ( r − e α δt, t − δt) respectively. These in turn are dependent on the density and velocity values at each lattice node. As f α is a density distribution function, the density can be calculated by summing f α up over all lattice velocities and similarly the momentum Local reconstruction of the LBS at a cell interface implementing a D3Q15 lattice velocity set.
can be calculated by summing up the first moment of f α , i.e: From the equation of state for an isothermal ideal gas, the pressure can be found using: When a gas has been left alone for a sufficiently long period of time, it is assumed that f α will reach an equilibrium which is defined by f eq α . We can make this assumption as collisions tend to even out the angular distribution of particle velocities in a gas around a mean velocity. As this convergence to equilibrium must conserve mass and momentum at all locations, equilibrium values can also be used to calculate the macroscopic density and momentum at any location as follows: The pre-streaming equilibrium density distribution function f eq α ( r − e α δt, t − δt) is simply calculated by interpolating the macroscopic values from neighbouring cells at time t − δt: Inputting these values into Eq. (7) gives f eq α ( r − e α δt, t − δt) . The next step is to find a value for f eq α ( r , t ) . This is simply done by finding ρ( r , t ) and V ( r , t ) . As mass and momentum are conserved ρ( r , t ) and V ( r , t ) can be calculated at the cell interface by summing the pre-streaming equilibrium distribution functions, i.e.: Inputting these values of ρ( r , t ) and V ( r , t ) into Eq. (7) will give f eq α ( r , t ) . Once f eq α ( r , t ) and f eq α ( r − e α δt, t − δt) have been found, f neq α ( r , t ) can be calculated and the fluxes at the cell interface calculated using Eq. (4) .

Gradient calculation
A key element of the LBFS is the accurate calculation of the gradient of the macroscopic variables at each cell centre as these gra-dients are then used in Eq. (13) and Eq. (14) to initialise the local LBS at the cell interface. An inaccurate calculation of the gradient will lead to a local LBS that does not reflect the flow field correctly. Two prominent methods used to calculate gradients on unstructured grids are the Green-Gauss and Least-Squares methods. Shima et al. [43] note how the Green-Gauss method is only fully accurate on uniform and symmetric grids but performs better on the thin and curved meshes with high aspect ratio cells that often exist within boundary layers for high Reynolds number flow calculations. Shima proposed a new hybrid Green-Gauss/Weighted-Least-Squares (GLSQ) approach for calculating the gradient. The GLSQ involves a geometry-dependent switch which applies the Green-Gauss method to those cells with a high aspect ratio and the Least-Squares method to cells with a low aspect ratio. This approach also has the benefits of ensuring monotonicity at cell interfaces, increasing the robustness of the solver, and allowing the accurate handling of hanging nodes in the gradient calculation. This is for a trivial increase in computational cost compared to the Least-Squares method. For these reasons the GLSQ approach is adopted in this work and is explained in detail in Appendix A .

Preconditioned lattice boltzmann flux solver
The PLBFS takes the same form as the LBFS with the preconditioned flux tensor F p is given by: (17) and f neq α is defined as where γ is the preconditioning parameter and τ p is the preconditioned relaxation factor. The preconditioning parameter γ relates τ p to the standard relaxation factor τ s in the following way: Letting the preconditioning matrix then through the Chapman-Enskog expansion, the flux tensor in Eq. (1) for the preconditioned NS equations can be shown to be: where p * = γ c 2 s ρ (21) and For steady flows utilising Eq. (20) in Eq. (1) is equivalent to utilising Eq. (2) in Eq. (1) but with a different equation of state. To demonstrate the effect of the preconditioning parameter on the rate of convergence, the partial differential equation form of the preconditioned NS equations is employed: where A , B , C are the Jacobians of the flux vectors given by: and F p,x , F p,y , F p,z are the components of the flux tensor in the x, y and z directions respectively. Considering only the inviscid terms, these are given by: The preconditioned convection matrix PA is now written as: To calculate the eigenvalues of PA , the determinant of ( PA − λI ) is first calculated: The eigenvalues λ are found by letting Eq. (27) equal 0 giving: The aim of preconditioning is to reduce the stiffness of the system by scaling the eigenvalues of PA appropriately. The reduction in stiffness can be measured by the condition number (CN) given as: where Ma is the dimensionless Mach number. A CN close to 1 indicates a well balanced system with no stiffness. From Eq. (29) it can be seen that to achieve a CN close to 1, it is required that Ma → ∞ or γ → 0. As Ma is restricted to values < 0.4 to satisfy incompressiblity requirements, reducing γ can be used to reduce the stiffness of the system.

Time integration
One advantage of the LBFS is that it allows a decoupling of the lattice streaming time step δt and the time integration of the NS equations. This enables the use of many different methods for the time integration of the NS equations in the LBFS. A fourth order Runge-Kutta (RK4) time-stepping scheme is adopted in this work as it has been shown to be very efficient and robust for a cellcentred finite-volume discretisation [44,45] . While acknowledging that steady flow problems do not require high levels of time accu- racy, the RK4 scheme is far more robust than RK1/Forward Euler time-stepping schemes with regards to the stability limits of the magnitude of the viscous eigenvalues (see Fig. 2 ). As will be shown later in Section 3.3 , applying preconditioning increases the viscous eigenvalue of the flow and necessitates the adoption of the RK4 scheme. Rewriting Eq. (3) in terms of a residual gives: d dt where R ( Q i ) is the residual function defined by: and L is the flux spatial discretisation operator defined as follows: A RK4 integration scheme can be typically described as: To ensure the stability of the time-stepping scheme, t must fulfil the CFL condition [46] . A CFL condition for an explicit scheme where the allowable CFL number σ = 1 means that t must be equal to or smaller than the time required to transport information across the stencil of the spatial discretisation scheme. Using higher order integration methods allows higher values of σ to be used. For a RK4 scheme σ = 2 . 78 . There are many ways to estimate the maximum allowable time step on unstructured grids with the following approach adopted in this work [47] : where c and v represent estimates of the spectral radii of the convective and viscous Jacobians respectively for cell i. C is an arbitrary constant which is normally chosen to be C = 4 . In the case of a cell-centred finite-volume scheme, the spectral radii are calculated as: where the Cartesian components of the spectral radii are given as: Calculate the gradients at the cell centres using Eq. (A.1) .

3.
At the cell interface initialise a D3Q15 local latttice Boltzmann solution with δx chosen so that all lattice grid nodes lie within the two cells adjoining the shared surface. 4.

9.
The inviscid and viscous fluxes can then be calculated from f eq α ( r , t ) and f neq α ( r , t ) using Eq. (16) .

10.
The solution can then be advanced in time using Eq. (33) .
where S i,x , S i,y and S i,z represent projections of the cell onto y-z, x-z and x-y planes respectively, i.e.: where S x , S y and S z denote the x, y and z components of the face vector S i,n = n i,n · S i,n , S i,n is the surface area and n i,n is the outward normal of face n of cell i . The time step for the whole domain is then given as: For unstructured meshes, where the ratio of the largest cell to the smallest cell is large, this can result in extremely slow convergence of the solution to steady-state. However the transient behaviour of the flow in steady problems is of no interest. As a result local time-stepping can be employed and each cell is progressed at its maximum allowable stable time step calculated by Eq. (34) . This results in significant convergence acceleration. The computational procedure employed in the PLBFS is summarised in Table 2 .

Overview
One advantage of using an unstructured grid with the LBFS is that it allows local refinement of the mesh in areas of rapidly changing flow. In practice, this involves having a very fine mesh close to surfaces in the flow and a much coarser mesh further away. This is in contrast to the traditional LBM where a uniform mesh density is used throughout the computational domain. When Guo [34] applied preconditioning to the LBM, there was no need to consider the impact of preconditioning on larger convection dominated cells or relatively finer cells in the boundary layer that are viscous dominated. As a result the purpose of that work was to reduce the stiffness of the inviscid Jacobians. In this section it will be shown that the PLBFS can improve the accuracy of flux calculations in larger cells compared to the LBFS but can also have an adverse effect on local time-stepping in viscous dominated cells.
An optimal choice of γ will consider both these factors while try-ing to reduce the CN of the system, and a strategy for making this choice is proposed at the end of this section.

Influence of preconditioning on lattice streaming distance
When choosing a lattice streaming distance δx , the preferred strategy is to maximise the size of the lattice while keeping all lattice nodes within the adjoining cells to the common cell interface. This has the effect of reducing the error in interpolating the macroscopic variables to the lattice nodes using Eq. (13) and Eq. (14) . This is due to the fact that as δx increases, the distance between the lattice nodes and the cell centre, from which the macroscopic variables are interpolated, is reduced. This increases the accuracy of the interpolation as the gradient is calculated at the cell centre and the lattice nodes are closer to the cell centre.
However this leads to issues in relatively large cells that are prevalent in unstructured grids. From Eq. (22) , as δx increases τ p decreases. This has an adverse affect on stability as the LBM becomes unstable as τ p tends to its minimum value 0.5 especially in the range 0.50 < τ p < 0.55 [48] . The special case of inviscid flow i.e. when τ p = 0 . 5 is not subject to this limitation as Eq. (22) is satisfied for an arbitrary lattice streaming distance δx . Preconditioning is most advantageous in scenarios when the stability limit of τ p limits the size of δx . Decreasing γ allows an increase in δx for a constant τ p . The benefits are illustrated by Fig. 3 , where decreasing γ allows a much larger lattice reconstruction without introducing instability by lowering the value of τ p .
As result of this property, preconditioning allows the use of coarser unstructured grids than the unpreconditioned LBFS with increased accuracy. This allows for either a decrease in runtimes for similar levels of accuracy or an increase in accuracy for the same runtime.

Influence of preconditioning on viscous dominated cells
Using preconditioning can have an adverse effect on timestepping in viscous dominated cells, i.e. where the viscous spectral radii is larger than the convective spectral radii, or can cause convection dominated cells to become viscous dominated. To illustrate the impact of γ on the local time step of these relatively fine cells employed in viscous dominated regions, the spectral radii of a cell is calculated using Eq.s (34) to (37) and its assumed that V → 0 as the cell is assumed to neighbour a surface in the flow. For a uniform cell with edge length l , surface area S and volume V the convective and viscous spectral radii become:  The ratio of the viscous spectral radius to the inviscid spectral radius, including the constant in Eq. (34) , is given as: 2 μ γ c s lρ (40) Re-writing in non-dimensional form and defining l relative to the reference length L gives: where is a constant and N = L/l is the number of uniform cells along the reference length L . Values of vc > 1 indicate that the flow in a cell is viscous dominated. If vc > > 1 this means that the time step in Eq. (34) is calculated based off the viscous spectral radius and is not optimised for the inviscid contributions. Fig. 4 illustrates how and cell size influence if a cell is convection or viscous dominated. Cells becomes viscous dominated as the edge length of the cell becomes smaller and the viscous threshold of edge length, where the cell changes from convection domi- First attempt using the optimal precondition parameter provide by Izquierdo [37] where γ = 2 Ma .

2.
If no convergence acceleration is experienced, then for a given Re and mesh, investigate the histogram of cell sizes in the computational domain and identify the most common cell size.
If vc > 1, then decrease Ma , increase γ or increase the size of the smallest cell in the mesh.

Repeat
Step 3 until an acceleration in runtime is achieved or γ = 1 nated to viscous dominated, is dependent on . Smaller values of allow finer cells to remain convection dominated. To reduce the value of , the following choice of non-dimensional parameters is desired: • Reduced values of Ma .
• Larger values of Re .

Preconditioning parameter selection strategy
The purpose of preconditioning the NS equations is to reduce the stiffness of the inviscid Jacobians to accelerate convergence of steady flow calculations. However as shown in the previous sections, the use of preconditioning can have advantages and disadvantages when it comes to the use of unstructured grids. Preconditioning allows the use of larger cells in convection dominated flow Re (42) Note that Eq. (42) will only hold for every cell in uniform grids only. In grids where there is a large variation in cell size, there will be cells that will have vc > 1. Therefore it is recommended to investigate a histogram of cell size of the computational domain and optimise γ for the most common cell sizes.
The impact of the choice of preconditioning parameter on stability should also be considered. Izquierdo suggested the optimal choice of γ = 2 Ma [37] as this value maximises speed up while offering robust stability. However it was provided with the caveat that the optimal value increases for lower Reynolds numbers. A strategy for choosing a value of γ is given in Table 3 . It is preferable to decrease Ma rather than have vc > 1 as lower values of Ma reduces the compressibility error at no cost in runtime when local time-stepping is used. It should be noted that accurately calculating the real eigenvalues in a cell is quite difficult and that

Overview
To demonstrate the capability of the PLBFS, two flow problems are considered: 3D lid-driven cavity flow and 3D flow over a circular cylinder. For both flow problems unstructured hexahedral meshes were used to demonstrate that the PLBFS can be effectively employed with an unstructured mesh topology. Flow problems were solved using a variety of Reynolds and Mach numbers and the impact of various values of the preconditioning parameter γ are investigated to show the impact of preconditioning on accuracy and convergence rates. All meshes were created with ANSYS ICEM meshing software (ANSYS Inc., Canonsburg, PA, USA).
The root mean square (RMS) of the ρu -momentum residual was used to monitor convergence to steady-state and is defined No of Cel l s (43) where t is the iteration index and R ( Q i ) is the ρu residual calcualated for cell i using Eq. (31) . The calculated RMS is normalised relative to the initial RMS value calculated at the end of the first iteration. i.e The convergence criterion employed is a five order magnitude reduction in the normalised residual, i.e. R RMS,Norm < 1 * 10 −5 (45)

3D Lid-driven cavity flow
Shear-driven flow in a square/cube cavity is a standard test case for validating predictions of incompressible viscous flow. It has historically been used as a benchmark flow problem to investigate the accuracy and the performance of CFD codes [49] . Lid-driven cavity flow is steady for a Reynolds number less than 10 0 0 0, with a moving lid developing the flow. The steady-state predictions by the PLBFS are compared to existing numerical results produced by Ku et al. [50] and Ding et al. [51] . The set up of the lid-driven cavity problem is illustrated in Fig. 5 . The top boundary moves with a velocity u lid . This is implemented using a Dirichlet boundary condition specifying u lid . A no-slip boundary condition is implemented at the remaining five boundaries. For density, a Neumann boundary condition, specifying zero change in density across the boundary, is implemented at all boundaries. The initial conditions are set as V = 0 and ρ = 1. Multiple test cases were run for this flow problem with different combinations of Reynolds number, Mach number, preconditioning parameter γ and mesh density. The test case parameters are summarised in Table 4 and were chosen as  per Table 3 .
Test Cases 1-8 were run using the mesh shown in Fig. 5 consisting of 90480 hexahedral cells. Test Cases 9-13, performed without preconditioning, were run on finer meshes consisting of 226981 and 531411 cells. The origin of the coordinate system is located at the centroid of the cube and the edges on the cube have a reference length L re f = 10 . Predicted velocity profiles on vertical and horizontal centrelines respectively in the z = 0 plane for Test Cases 1-13 are shown in Figs. 6 , 7 , 8 , 9 . The results are compared to the predictions of Ku et al. [50] and Ding et al. [51] . Overall there is good agreement with the literature by the preconditioned test cases and the test cases performed on finer meshes. To show the flow patterns of the 3D lid-driven cavity flow, 2D streamlines are projected onto three centroidal planes at x = 0 , y = 0 and z = 0 . The flow patterns are shown for Test Cases 4 and 8 where Re = 100 and Re = 10 0 0 respectively. These results are shown in Fig. 10 and show the development of stronger secondary vortices and a stronger 3D impact as the Reynolds number increases. This is in agreement with the predictions of previous studies by Ku et al. [50] , Ding et al. [51] and Wang et al. [14] . Convergence histories are plotted in Fig. 11 and Fig. 12 .
Using an unstructured hexahedral mesh produces excellent results that agree with predictions by other researchers in the literature. The previous study using a LBFS by Wang et al. [14] uses a 81 X 81 X 81 (531441 cells) non-uniform structured mesh and it's predictions have excellent agreement with the studies of Ku et al. [50] and Ding et al. [51] . Using the PLBFS on a fully unstructured grid allows similar levels of accuracy to be attained while using a mesh with only 90480 cells. The results also confirm the theoretical analysis in Section 3.2 . This analysis suggests that preconditioning should enable the use of large δx on lattices in relatively larger cells. This should reduce the interpolation error when applying Equation. Inspecting Table 4 shows that preconditioning allows the use of larger δx in the bigger cells in the mesh. Inspecting Figs. 6 to 9 show that the preconditioned case with the larger δx is more accurate than the unpreconditioned cases with smaller δx .
Figs. 6 to 9 also show that increasing the mesh density results in more accurate comparisons with the benchmark solutions. There is also significant convergence acceleration with the use of preconditioning at lower Ma numbers. The reduction in iterations required to reach convergence compared to the non-preconditioned case is 9.72x at Re = 10 0 0 and 5.62x at Re = 100 for Ma = 0 . 017 . At the higher Mach number of Ma = 0 . 17 , the decrease in CN is offset by disproportionately larger increase in vc, max (see Table 4 ). This accounts for the lack of significant acceleration in convergence at the higher Mach number. An alternative to preconditioning to improve accuracy is to increase the mesh density. Figs. 6 to 9 show that the preconditioned test cases produce results that are of similar accuracy to those of the unpreconditioned test cases on finer meshes. This is also with the benefit of a reduction in the iterations required to reach convergence and a reduction in the flux calculations per iteration. As can be seen in Fig. 11 and Fig. 12 , this effect is most prominent for high Re and low Ma . The preconditioned case of Re = 10 0 0 and Ma = 0 . 017 uses 5.87x less cells and 20.67x less iterations than the unpreconditioned case on the finest mesh. In this case preconditioning enables a 121x reduction in computational effort while attaining similar levels of accuracy.

3D flow over a circular cylinder
As mentioned previously, one of the motivations behind using the LBFS is to avoid the staircase approximation of curved boundaries with the traditional LBM. To demonstrate the applicability of the LBFS to flow problems with curved boundaries, 3D flow over a circular cylinder was investigated. This is a steady flow for low Reynolds numbers and so the PLBFS can also be used to solve this flow problem. The problem is set up as in Fig. 13 , with the cylinder immersed in a uniform freestream. The diameter of the cylinder is the reference length L ref with a value of L re f = 1  and spans the length of the z-domain. The boundary condition at the inlet boundary was set equal to the freestream density ρ ∞ and freestream velocity U ∞ using Dirichlet boundary conditions. At the outlet boundary condition the pressure was set equal to static pressure by a Dirichlet boundary condition and a zero change in velocity at the outlet was maintained by a Neumann boundary condition. A no-slip boundary condition was applied to the wall of the cylinder and the remaining boundaries all have a zero change in velocity and density which are maintained by Neumann boundary conditions. The initial conditions were set as V = 0 and ρ = 1.
The flow problem was run for different combinations of Reynolds number, Mach number and preconditioning parameter γ . The test cases parameters are summarised in Table 5 . The values of γ were chosen for each test case as per the approach given by Table 3 .
All test cases were run with a variety of meshes with densities of 7021, 13051, 18517, 23073 and 25877 cells. For each mesh, the cell height of the first cell adjacent to the cylinder wall was chosen as 0.05 and is one cell thick in the z-direction. The mesh containing 23073 cells is shown in Fig. 14 . For Re = 20 and Re = 40 , the flow is steady. The predicted streamlines and velocity contours Test Case 1 and 5 are shown in Fig. 15 . The streamtraces show stationary recirculation regions on the lee side of the cylinder. The flow pattern shown is consistent with the existing studies in the literature; however more detailed measures are required to demonstrate the performance of the PLBFS. To do this, a variety of parameters can be employed including the drag coefficient of the cylinder, the recirculation length and separation angle. The drag coefficient C d relates the force acting on a body by a fluid in the direction of the freestream velocity to the force of the freestream dynamic pressure acting on the frontal area of the body. This is calculated as follows:   (46) where A f is the frontal area (the area projected onto a plane normal to the flow direction) of the body and F d is the drag force which in this flow problem is calculated by integrating the pressure and viscous stresses acting in the x-direction over the surface of the cylinder, i.e.: The pressure and viscous stresses in Eq. (47) were calculated using Eq. (16) . The recirculation length L s is defined as the distance between the trailing edge of the cylinder and the stagnation point in the wake i.e. where V = 0 on the x-axis. The separation angle θ s is the angle between the trailing edge and the point where the  Fig. 16 . This shows that the PLBFS has an order of accuracy of approximately 2.3 for this test case and is at least second order accurate which is consistent with the work of Shu et al. [10] .
The results from the PLBFS are compared to previous numerical studies (Shu et al. [10] , Dennis and Chang [52] , Shukla et al.   [53] and Pellerin et al. [11] ) in Table 6 and  Fig. 20 . As shown in Fig. 20 , the reduction in iterations required to reach convergence compared to the non-preconditioned case is 4.92x at Re = 20 and 9.625x at Re = 40 for Ma = 0 . 017 . However there is an increase in iterations at the higher Mach number of 0.17. Due to the detrimental effect on convergence rates at Ma = 0 . 17 , further test cases were run to investigate this issue. These additional test cases were chosen as per Table 3 and are shown in Table 8 . Test Cases 9-12 used the same mesh with N cells = 23073 as before but with a more moderate level of preconditioning. The convergence histories of these additional test cases can be seen in Fig. 21 . These results show that any level of preconditioning has an adverse effect on convergence rates for Ma = 0 . 17 .
Using an unstructured hexahedral mesh produces excellent results that agree with existing studies in the literature. The previous studies using the LBFS were performed by Shu et al. [10] with a cell-centred non-uniform O-grid mesh and by Pellerin et al. [11] with a vertex-centred tetrahedral mesh. They achieve excellent accuracy using 60501 cells and 63541 vertices respectively. Using the PLBFS on a fully unstructured hexahedral grid allows similar levels of accuracy to be attained while using a mesh with 23073 cells. Inspecting Table 5 shows that preconditioning allows the use of larger δx max in larger cells in the mesh. This leads to more accurate results. It can be seen that Test Cases 3 and 7, which do not employ preconditioning, have δx max values that are an order of magnitude lower than the preconditioned cases. In both cases, δx max is less than the cell height of the first cell in the boundary layer. This results in drag coefficients, recirculation lengths and separation angles which are not as accurate as in the preconditioned cases. From the initial results in Fig. 20 , preconditioning has

Conclusion
In this paper a preconditioned lattice Boltzmann flux solver (PLBFS) has been successfully used to solve flow problems on 3D unstructured hexahedral meshes. This is demonstrated by successfully simulating 3D lid-driven cavity flow and 3D flow over a circular cylinder. The use of a unstructured hexahedral mesh topology is shown to enable the use of coarser meshes than those required using a structured mesh topology for similar levels of accuracy. This accuracy is also further increased by using preconditioning as it allows the use of a more optimal local lattice Boltzmann reconstruction in larger cells. This effect is demonstrated to be more significant at higher Reynolds numbers and lower Mach numbers. Preconditioning also has the benefit of accelerating convergence in the vast majority of cases. Again this effect is more beneficial at higher Reynolds numbers and lower Mach numbers as the unpreconditioned condition number is larger. It has also been shown that in certain cases, such as flow problems with low Reynolds numbers and very fine mesh refinement in the boundary layer as with flow over a circular cylinder, the acceleration due to preconditioning can be inimal. In such cases it is recommended to use moderate levels of preconditioning, so that the ratio of the viscous spectral radius to the convective spectral radius remains less than one, or to use a coarser grid where possible if sufficient accuracy can be achieved with such a mesh. The PLBFS has been shown to greatly enhance the use of unstructured meshes with the LBFS, particularly in flow problems with higher Reynolds numbers.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. where AR is the effective aspect ratio of the cell and is given by: r j is a position vector for the centroid of the neighbouring cell with shared face j , r i is a position vector for the centroid of the current cell i and V i is the cell volume of the current cell i . M i is a matrix that weights the contribution of each neighbouring cell to the least-squares gradient calculation and is given by: x , y and z are the changes in location along the Cartesian axes between the current cell's centroid and the j th neighbouring cell's centroid. α wj is an interpolation factor given by: where r k is the centroid of the face j and ω j is a weighting function given by: Finally x i j is given by: