Near-wall modeling of forests for atmosphere boundary layers using lattice Boltzmann method on GPU

In this paper, the simulation and modeling of the turbulent atmospheric boundary layers (ABLs) in the presence of forests are studied using a lattice Boltzmann method with large eddy simulation, which was implemented in the open-source program GASCANS with the use of Graphic Processing Units (GPU). A method of modeling forests in the form of body forces injected near the wall is revisited, while the effects of leaf area density (LAD) on the model accuracy is further addressed. Since a uniform cell size is applied throughout the computational domain, the wall-normal height of the near-wall cells is very large, theoretically requiring a wall function to model the boundary layer. However, the wall function is disregarded here when the forest is modeled. This approximation is validated based on the comparison with previous experimental and numerical data. It concludes that for the ABL conditions specified in this study as well as a large body of literature, the forest forces overwhelm the wall friction so that the modeling of the latter effect is trivial. Constant and varying LAD profiles across the forest zone are defined with the same total leaf area despite the varying one being studied previously. It is found that the two LAD profiles provide consistent predictions. The present forest modeling can therefore be simplified with the use of the constant LAD without degrading the model accuracy remarkably.


Introduction
When wind farms are located within forest terrain, the presence of trees are known to generate strong atmospheric turbulence, which can in turn lead to an adverse impact on the performance and fatigue life of wind turbines (Nebenführ & Davidson, 2017).In recent years, there has been increased focus on the siting of wind turbines within forests.This is in part motivated by the opportunity to reduce the visual and environmental noise footprint of turbines, since the forest canopy and background noise provide a natural screen.It is also driven by need, since around 30% of the Earth's land surface is covered by forest.It is thus increasingly relevant to find an effective method to simulate wind field under the influence of forests to help inform the siting at the early design phase of a wind farm.
The effect of forests may be considered as a kind of surface roughness which is modeled by introducing a roughness length (Lo, 1990).However, this method is only applicable to short vegetation.The permeability of CONTACT Xiao Xue xiaox@chalmers.sethe foliage should also be taken into account for relatively high forests.Shaw and Schumann (1992) presented the effect of forests in their simulations as a drag force applied by the trees on the flow and the drag force varies according to the varying leaf area density (LAD).Their method has been accepted by many researchers and has been used to investigate many aspects of canopy flows, for example, coherent structures in canopy flow (Dupont & Brunet, 2009;Finnigan et al., 2009;Nezu & Sanjou, 2008), the influence of LAD on canopy flow (Dupont & Brunet, 2008b) and edge flow due to heterogeneity in forest structure (Dupont & Brunet, 2008a).Large-eddy simulation (LES) has proven to be an useful tool to unravel the engineering problems such as wind farms flow prediction (Stevens et al., 2014(Stevens et al., , 2015;;Stevens & Meneveau, 2017).The atmospheric boundary layer (ABL) can be simulated using conventional finite difference methods (FDM) or finite volume methods (FVM), for instance, PALM (Maronga et al., 2015), ICON (Dipankar et al., 2015), and MicroHH (Van Heerwaarden et al., 2017).There are many studies which have endeavored to combine LES with forest models.Shaw and Schumann (1992) successfully represented a forest as a drag layer and heat sources in their LES.Shaw and Patton (2003) further refined the forest model by including skin friction effect under the LES framework.
LES is commonly adopted in the Navier-Stokes (NS) equation framework.Despited LES has been largely reduce the computational cost comparing to the direct numerical simulations(DNS), however, it is still computationally challenging due to the data-dependency nature raised by the conventional CFD methods.LBM can improve the parallel efficiency since each lattice cells information is stored and updated locally (Krüger et al., 2017).Studies have been conducted to involve LES into the framework of lattice Boltzmann models (LBM) (Hou et al., 1994).LBM is thus seen as a good alternative to more common methods based on NS solvers.In the past decades, LBM has become more widespread on account of its broad applicability across a range of complex fluid dynamics problems ranging from micro-nano scales (Belardinelli et al., 2015;Xue et al., 2020Xue et al., , 2018) ) to macroscopic scales (Filippova et al., 2001;Toschi & Bodenschatz, 2009;Xue et al., 2022) under low Mach numbers.A recent study combined LBM and LES to simulate ABL flows (Feng et al., 2021).They simulated neutral, stable and convective ABLs over a forest in reference to the cases in Nebenführ and Davidson (2015), where LES implemented with a FVM was used.Their simulations included the forest effects using a forest model.Furthermore, since the mesh resolution near the ground wall was not sufficient to resolve small-scale turbulent fluctuations, the effect was recovered using the Monin-Obukhov wall model.Their simulations results showed good agreement with the reference data in Nebenführ and Davidson (2015).However, the detailed comparison between the varying LAD and constant LAD is not presented.Also, most LES LBM computation are based on multi CPU architectures (Harwood, O'Connor et al., 2018;Krause et al., 2021;Latt et al., 2021), the intrinsic nature of parallel-friendly nature makes LBM well-suited for GPU architectures.
A difficulty in the application of LBM on GPUs is that although various mesh refinement techniques have been proposed, they suffer from reduced computational efficiency.Another concern is that as lattice points remain in a cubic structure, the mesh refinement is generally limited to octree based methods with a scale of 1-2 in each direction.It means that high aspect ratio cells, which are often used in wall-bounded turbulent flows, are generally not possible.To avoid loss of the computational efficiency, meshes with uniform cells can be adopted for ABL simulations on GPUs.However, a wall function in principle should be used because the near-wall mesh resolution is not sufficient to resolve small scale flow fluctuations.When the forest effects near ground walls are modeled, questions arise about whether the forest drag force is dominant and whether the wall function can be neglected.Moreover, it is unclear whether the accuracy of the forest modeling is sensitive to the LAD profile.These problems motivate the present study.We will investigate near-wall modeling to implement a uniform cell size in the whole computational domain, replacing the local mesh refinement.The effects of the wall function and forest model will be addressed.The influence of the LAD profile will be studied.

The lattice Boltzmann method
The flow is described based on the mesoscale lattice Boltzmann equations.The classical Bhatnagar-Gross-Kroog (BGK) LBM collision operator and D3Q19 model are adopted for the LBM.This model is a threedimensional lattice model with 19 discretized velocity directions c i (i = 0 • • • Q − 1).We consider f i (x, t) as the discretized particle's probability distribution function on the ith direction of a lattice cell.The lattice cell is located at position x at time t.The LBM governing equation for the fluid, considering collision and forcing, can be written as: where is a collision kernel (Krüger et al., 2017;Succi, 2001), t is the marching time step, F i is the volume force acting on the fluid following the approach from Guo et al. (2002).
The collision kernel is the classical BGK collision kernel which has been widely adopted to various applications (Succi, 2001).The BGK collision operator fixes the single relaxation time τ for the colliding process.Thus, the collision operator can be represented as: The collision kernel relaxes the distribution function towards the local equilibrium f eq i : where ω i is a suitable weight needed to impose the isotropy in the interaction, ρ(x, t) and u(x, t) are the macroscopic hydrodynamic quantities for density and velocity, respectively.The volume force F i (x, t) in Equation ( 1) can be obtained by: where F is the volume acceleration.The macro-scale quantities for the density, momentum and momentum flux can be calculated from the distribution function, the discrete velocity, and the volume force: ( 5 ) where the momentum flux can be defined as (x, t) = eq (x, t) + neq (x, t), where non-equilibrium parts are defined as: is the non-equilibrium distribution function which can be obtained via

Smagorinsky subgrid-scale model
The basic idea of LES is not only to resolve relatively large scales, but also to use a subgrid-scale (SGS) model to estimate unresolved small scales.The detailed implementation of the Smagorinsky SGS model can be found in Hou et al. (1994) and Koda and Lien (2014).Here, we briefly summarize the essential part of this model.The key of the LES modeling of Smagorinsky is to model the effective viscosity ν total , which can be seen as the sum of the molecular viscosity ν 0 and the turbulent viscosity ν t : where ν t is represented by: where C smag is the Smagorinsky constant, represents the filter size, and | S| is the filtered strain rate tensor, which can be obtained by: where τ 0 is the original relaxation time from the input, c = x/ t, and Q 1/2 can be written as: where neq is the non-equilibrium part of the momentum flux tensor shown in Equation ( 8).Following (Koda & Lien, 2014), we can obtain the total relaxation time τ total , which is written as: Finally, we replace τ in Equation ( 2) with τ total and obtain the Smagorinsky SGS collision kernel.

Forest model near the wall
We model the effects of the forest by introducing an additional drag force near the wall.The drag force is modeled with the help of the forest drag coefficient C D , the LAD a f and the local wind velocity u = (u, v, w) (Shaw & Schumann, 1992).Thus, the drag force of the forest, which is added in the streamwise direction, can be written as: y, z, t) u(x, y, z, t), (14) where y refers to the vertical distance to the wall and u(x, y, z) is the streamwise velocity.In this study, for the purpose of comparison with the previous study (Bergström et al., 2013), C D is set to 0.15, which has been adopted in their work.
According to Lalic and Mihailovic (2004), the LAD distribution a f (y) can be described by the function: where The parameter h is the total height of the forest, and y m is the location where a f (y) gets its maximum value a f m .

Numerical implementation
The simulations presented in this paper are performed using the open source CFD software GASCANS (Santasmasas, 2021), which implements LBM on multiple graphic processing units (GPUs) using CUDA and C++.
The main aim of GASCANS is to provide fast and accurate simulation of turbulent flows over complex geometries.To further increase its usability, it incorporates a synthetic eddy method (SEM) to generate turbulent velocity fluctuations from mean flow data and GASCANS can be coupled at run time with another CFD software (Santasmasas, 2021).It also implements immersed boundary method (IBM) and a structural solver for the simulation of filaments submerged in fluid.
The flow solver implements the BGK collision scheme, the LES Smagorinsky turbulence model and forcing scheme described in Section 2. The algorithm follows a merged stream-collision time loop and stores the particle distribution functions using a double-population scheme (Latt et al., 2021).The simulation data is stored in a mesh formed by equally sized cubic cells using a structure of arrays (SOA) pattern.
Figure 1 shows the GASCANS architecture and interactions between its components; the features not used for this paper are grayed out.The code is divided into two sections: the Application contains the user interface and inputs, while the Library contains the core software.This structure allows for the Application to be reworked or substituted to suit the user's needs without modifying the core of GASCANS.The user inputs are through the Parameters class, and the terrain and forest geometry is read from point cloud files.See (Santasmasas, 2021) for a detailed explanation of the GASCANS functionality and structure.The point cloud file contains a list of coordinates covering the forest's volume.GASCANS reads the file and assigns a code to each cell that contains one or more forest coordinates.The code for each forest cell contains the following information: • h: height of the forest, calculated as the bounding box of the forest object in the vertical direction.• y: distance in the vertical direction between the lower bounding box of the forest object and the current cell.• y m : distance in the vertical direction between the lower bounding box of the forest and the maximum value of the LAD distribution.
This information is stored into the forest cell's code using a bit mask to reduce the GPU memory required to store the information.Algorithm 1 shows the forest implementation in GASCANS.

Validation of simulation platform
Channel flow is a typical case of wall-bounded turbulence.The accuracy of the simulation platform, which uses the LES with the Smagorinsky SGS model, is validated based on the database of direct numerical simulation (DNS) at Re τ = 180 in Moser et al. (1999).
The computational domain is shown in Figure 2. The domain dimensions are non-dimensionalized based on the height of H = 1 meter.That is, the dimensions are L x × L y × L z = 12 × 2 × 4 m 3 in the stream- wise, vertical and spanwise directions.The mesh consists of cubic lattice cells.There are 31 cells per meters.Correspondingly, the mesh contains 372 × 62 × 124 lattice cells.
The upper and lower walls are set with the no-slip boundary condition, and the remaining boundaries of the computational domain with the periodic boundary condition.The flow is driven by a volume body force from Equation (4), which is defined as: The body force acts equivalently as the pressure gradient that balances the wall shear stress in the fully developed channel flow.The parameters set for the simulation are listed in Table 1.Here C smag is the Smagorinsky model constant, as shown in Equation ( 10).The dimensionless density ρ is set to unity in the lattice Boltzmann unit (LBU).The fluid is incompressible.The channel flow simulation is initialized with the zero velocity field driven by the body force F given in Table 1.A small block of 15 × 15 × 64 m 3 is placed at x = 1 m in the middle of the spanwise width.The simulation with the block is first performed for 16.5 periods to develop turbulence.Here one period denotes the time that the flow takes to pass through the domain.Then, the block is removed, and the simulation continues for another 82 periods before computing the spatial-averaged statistics.
Figure 3 shows the instantaneous streamwise velocity distribution and the Q-criterion of Q = 0.0002/s 2 .A plane is placed at the vertical position of y + = 9 to visualize streamwise velocity contours.Streaks featuring the boundary layer are observed.This is consistent with the previous DNS (Moser et al., 1999).Plentiful turbulent structures are identified within the flow field based on the Q-criterion.Hairpin vortices exist near the walls, leading to streak contours of the streamwise velocity.As shown in Figure 4, the statistics of the flow quantities from the present LBM LES method are compared to Koda's LBM LES data (Koda & Lien, 2014), and the FVM DNS by Moser et al. (1999).The non-dimensional time-averaged streamwise velocity is defined as u + = u /u τ .Here • is the averaging operator, and u τ denotes the friction velocity.The wall normal distance is defined as y + = yu τ /ν.The averaging is done in time and then in x-z planes that are parallel to the top and bottom walls.The non-dimensional root mean square (RMS) velocity components (u + rms , v + rms and w + rms ) are also normalized with the friction velocity u τ .The present results are consistent with Koda's LBM LES data, especially in the near wall region.But these results exhibit discrepancies in comparison with Moser's DNS data.It is observed that the first layer cell heights near the wall are y + 1 = 3 in the LES cases, indicating that the resolution of the LES mesh is not fine enough to resolve the boundary layer.This is the reason why both LBM LES methods underestimate u + as compared to DNS.Due to the same reason, the RMS values of the velocity components predicted in the LES cases are less potent than the DNS.It suggests that a   wall function or similar wall treatment should be applied to the near wall region to remedy the underestimations.We therefore use a forest model for the wall treatment in the ABL simulations in the next section.Nonetheless, the largest discrepancy of u + is less than 5%, and the largest RMS velocity difference less than 10%.

Case description
A classical ABL case reported in Nebenführ andDavidson (2014, 2015) is studied using the present simulation platform.The computational domain is shown in Figure 5.The physical dimensions are L x × L y × L z = 1000 × 400 × 800 m 3 , which was also used and validated in Nebenführ andDavidson (2014, 2015).To produce the representative forest effects, a forest with a homogeneous distribution over the ground is assumed (Nebenführ & Davidson, 2014).The height of the forest is h = 20 m.The freestream flow direction is set along the x axis.The forest friction velocity is defined as , 2014, 2015).The flow is driven by a constant non-dimensional body force of 2.38 × 10 3 per unit volume on the basis of u * and L x .The force is imposed in the whole domain.At the bottom wall below the forest, a no-slip boundary condition is applied.The inlet and outlet are set with the periodic boundary condition, and the spanwise side boundaries are also set with the same boundary condition.The upper boundary is specified with a forced equilibrium boundary condition (Latt et al., 2008).The validity to apply this boundary condition will be addressed in the following discussion.Also, the upper wall is sufficiently far away from the bottom, and only the boundary layer region near the bottom wall where wind turbines are embedded are of interest for the wind energy harvesting, this forced equilibrium boundary condition is suitable for the present ABL case.In reality, the LAD a f varies with the vertical height of a tree, as the tree crown has dense leaves and the trunk is less obstructing.In the current work, we investigate three distributions of LAD, as listed in Table 2. Case 1 has constant LAD in the vertical direction, indicating that the leaf and obstructions are uniform along a tree.Case 2 has varying LAD, which is more similar to the real situation (Nebenführ & Davidson, 2014, 2015).Case 3 has no forest force imposed (i.e.constant LAD of a f = 0).However, the LAD distributions of cases 1 and 2 have the same total obstruction, which is defined as A f = h 0 a f (z) dz, whereas case 3 has no obstruction.The LAD distributions of cases 1 and 2 are illustrated in Figure 6.In case 1, a f = 0.215.In the varying LAD case, a f is set on the basis of Equation ( 15) with the coefficients z m = 0.7h and a f m = 0.38.
Moreover, a sensitivity study on the forest drag coefficient C D is performed for the varying LAD with C D = 0.13, 0.15, and 0.2.The normalized time-averaged streamwise velocity as function of y/h is shown in Figure 7.All results almost overlap with each other, indicating that the model is not sensitive to C D .Therefore, choosing C D = 0.15 in the present study is robust for the following analysis.
To develop the turbulence, three small boxes are set near the inlet in the initial stage of the simulations.The non-dimensional box size is 20 × 20 × 20 m 3 .Three boxes are uniformly distributed along the spanwise direction.After 15 periods of the flow passing through the computational domain when turbulence is developed, the boxes are removed from the computational domain.The simulation is then run for additional 80 periods to develop turbulence.The time-averaged statistics are obtained by averaging the data for 100 periods.
As shown in Figure 8, the normalized instantaneous velocity gradient near the upper boundary is overall less than 0.012.Here the normalization is based on the velocity of the equilibrium boundary condition and the unit meter.It indicates that the change in velocity per meter is below 1.2% relative to the given velocity, and the flow close to the upper boundary is nearly in equilibrium in both time and space.

Mesh independence study
The mesh quality for the simulation accuracy is examined with the meshes with coarse, medium and fine cell size.The constant LAD of a f = 0.215 is adopted for the forest modeling.The information of the meshes are presented in Table 3.The coarse mesh has the number of cells in the three dimensions as N x × N y × N z = 200 × 80 × 160, resulting in 2.6 × 10 6 cells in total.We define the refinement ratio as the ratio of a refined cell size to the reference cell size.In reference to the coarse mesh, the refinement ratios of the medium and fine meshes are 0.8 and 1/1.75 ≈ 0.57.
The computational time and memory costs relevant to the meshes are also reported in Table 3.The GPU, GeForce RTX 2070 SUPER, with a total memory of 8 GB is used for the computations.The coarse mesh simulation consumes the computing time of approximately 8.5 hours per 1 × 10 6 time steps and the memory of nearly 1.44 GB.For the medium mesh simulation, the computing time is 1.65 times larger than the coarse mesh, and the memory cost is 1.5 times larger.The fine mesh simulation is 4.59 times faster and 4.5 times more memory consumption.The increasing ratios of the computing time are comparable with those of the memory costs.On the other hand, the total number of cells in the medium and fine meshes increase with ratios of 1.95 and 5.36, which are larger than the ratios of the computing time and memory costs.It suggests that the increased mesh size improves the computational efficiency.The reason is that there are modules responsible for processing and storing solution data in the present simulation platform.The additional computational and memory costs from these modules are independent on the mesh size, so they are more weighted in the coarse mesh case, as compared with the other  refined mesh cases.Moreover, the GPU solves the LBM equations on all cells at the same time.An increased number of cells has less impact on the computing time if the former coarse mesh does not fill all capacity of the GPU.Thus, GPU is well suited for the LBM-based large scale applications, although most LBM applications carried out on GPU is memory-bounded (Harwood, Wenisch et al., 2018, June 11-15, Glasgow, UK;Santasmasas, 2021).However, new techniques have been developed to improve the memory access on GPU (Obrecht et al., 2014(Obrecht et al., , 2016)).Despite relatively smaller CFL numbers needed to maintain the numerical stability of LBM as compared to FVM, solving LBM has proven to be faster than solving the Navier-Stokes equations due to its intrinsic low data dependency, that is, each lattice cell can be updated locally without a need to wait for the neighbor cells' update (Krüger et al., 2017;Marié et al., 2009).
The normalized averaged streamwise velocity along the wall normal direction for the simulations of the three meshes is shown in Figure 9.The averaging is done in time and then in x−z planes that are parallel to the bottom wall.Hereafter, the averaged quantities reported in this study are obtained using the same averaging method.As compared to the experimental data from Bergström et al. (2013), the fine mesh simulation shows the best agreement.In the range of the experimental data, the coarse mesh simulation gives better prediction than the medium mesh simulation.But these two simulations become similar above y/h = 14, which is far from the bottom wall and forest.The profile of the coarse mesh simulation rises more slowly for y/h > 2. It implies that there are less turbulent vortices resolved by the coarse mesh.
The Reynolds stress tensor components, u u , v v , w w and u v , normalized with the averaged wall friction velocity, u * , are shown in Figure 10.It is observed that the coarse mesh leads to the worst prediction of all components in reference to the experimental data from Bergström et al. (2013), whereas the prediction of the fine mesh is most consistent.Therefore, a general trend is that the mesh refinement improves the simulation accuracy.For the streamwise component u u /u 2 * in Figure 10, the slope of the profile from the coarse mesh simulation is more straight than the other meshes.This is a typical phenomenon when the turbulent flow is less developed.As a consequence, velocity fluctuations owing to turbulence vortices are insufficient to form a steeper profile.This effect is also found for the wall normal component v v /v 2 * and the spanwise component w w /w 2 * , but it is not so significant as u u /u 2 * .Since streamwise velocity fluctuations have larger magnitudes than spanwise and wall normal velocity fluctuations, the streamwise velocity ones are more influential in the formation of the Reynolds stress component u v .As seen in subfigure 10d, the profiles of u v /u 2 * show similar trends to u u /u 2 * .The fine mesh results in overall the best prediction, as indicated in Figures 9 and 10.Based on Table 3, the averaged first-layer cell height over the wall is y+ = 165.Moreover, the results of the medium and fine meshes are nearly the same.This converged prediction means that both medium and fine mesh resolutions are sufficient to simulate the current ABL case.According to the scalability of the computational time and memory costs with respect to the mesh size found in Table 3, the fine mesh has the best performance in computation.Since the fine mesh resolves more small turbulence scales than the medium mesh, it is chosen in the following simulations and analysis although more computational costs are required.
Based on Table 3, the averaged first-layer cell height over the wall is y+ = 165.This infers that the wall modeling should be used in principle.However, the practice in the current simulations show that good agreement is achieved without using a wall function.The reason is that the forest forces are much larger than the wall friction.This effect will be discussed in the subsequent section on the forest modeling.

Effects of forest modeling
To investigate the effects of the forest modeling, the simulation cases with different LAD are defined in Table 2. Case 1 (the modeled forest with the constant LAD) has been discussed in the preceding section, and it is found producing consistent prediction in comparison to the experiments.
For case 2 (the modeled forest with the varying LAD) and case 3 (zero LAD, i.e. no modeled forest), the instantaneous normalized streamwise velocity u/u * are shown in Figure 11.Case 2 exhibits generally much smaller velocity fluctuations than case 3 in the whole computational domain.Fluctuations are visualized at a x−z plane of y/h = 0.5, which is the center position of the modeled forest zone.Velocity magnitudes in this plane in both cases are smaller than those in the region far from the bottom wall.As case 2 includes the drag forces from the forest modeling and from the wall friction, the small fluctuations within the forest zone are explainable for this case.There are no forest drag forces imposed in case 3.In other words, case 3 does not have a forest zone.Its near-wall flow is only dragged by the wall friction.Therefore, the forest dragging effect is dominant in reducing the velocity of the ABL flow, as compared with the wall friction.It is observed in case 2 that the velocity quickly increases to larger values outside the forest zone.This also qualitatively indicates that the significant dragging effect from the modeled forest.The same finding was reported in Nebenführ andDavidson (2014, 2015).
The vertical profiles of the averaged streamwise velocity u /u * in all simulated cases listed in 2, as well as the FVM LES with the same varying LAD setting as case 2 in Nebenführ andDavidson (2014, 2015) and the experiment data in Bergström et al. (2013), are plotted in Figure 12.In the near ground zone of y/h < 6 where large part of the boundary layer is located, cases 1 and   Nebenführ andDavidson (2014, 2015) and the experiments by Bergström et al. (2013).
2 exhibit good agreement with the experimental data.Both FVM LES and LBM LES agree with the experimental data for y/h < 5, but the discrepancies between them are observed for larger heights.The reason is that the flow simulated with the FVM LES has a larger bulk velocity than the other one with the LBM LES.It is worth noting that the bulk velocity in the atmosphere is technically difficult to measure and, therefore, was not obtained in the experiments.The flow parameters in the simulations have been adjusted to achieve the agreement with the experimental data.As turbulent fluctuations in flows at high Reynolds numbers show universal statistics, the current flow simulation is also suitable to provide enough information about the turbulence statistics.This will be analyzed in the following discussion.For example, in Figure 13 where the two methods produce similar RMS values of velocity fluctuations.The mesh for the FVM LES contains 10 cells in the forest zone.It has been demonstrated that the mesh resolution is sufficient to resolve the profile of the varying LAD illustrated in Figure 6 (Nebenführ & Davidson, 2014, 2015).There are 7 cells in the fine mesh for the current simulations, which also resolves the same varying LAD profile.The comparison of cases 1 and 2 indicates that the constant and varying LADs lead to slight differences in the prediction of the mean velocity in the forest zone and most of the zone above the forest.
In case 3 that has no forest, the boundary layer development is completely different from cases 1 and 2 considering the forest effects.The velocity gradient in the forest zone in case 3 is much larger than the other cases.A similar phenomenon was also reported in Nebenführ andDavidson (2014, 2015).The reason is that the forest model introduces significant drag forces into the flow.Based on Equation ( 14), the forest drag force is independent from the wall friction.Its vertical distribution is not governed by the classical boundary layer statistics, which can be resumed using a wall function.Although the wall function is ignored, only considering the forest model is still able to predict the mean velocity profile matching the experiments.This is explainable because the forest force overwhelms wall friction.
Figure 13.The stress tensor components in the present simulations with different forest modeling settings, as well as from the simulations using the FVM LES by Nebenführ andDavidson (2014, 2015) and the experiments by Bergström et al. (2013).
The vertical profiles of the normalized stress tensor components are displayed in Figure 13.In the simulations set with the forest model (cases 1 and 2, as well as the FVM LES in Nebenführ & Davidson, 2014, 2015), the predicted trends of u u are consistent.But a mismatch between these methods and the experimental data is observed in the near forest zone of 2 < y/h < 4. Cases 1 and 2 also show v v and u v comparable with the experimental data.The different LAD settings of cases 1 and 2 lead to discrepancies between their results.But the results are approximately identical at the first cell height.Recalling Equation ( 14), the forest drag force modeled is proportional to the local kinetic energy.Since the velocity at the wall is zero based on the no-slip boundary condition, the local forest drag force diminishes there.The small velocity near the wall leads to the negligible force, although the constant LAD is much larger than the varying LAD near the wall (see Figure 6).As the majority difference of the two LAD settings are defined near the wall for y/h less than 0.4, their influence on the drag force production is limited owing to the small local velocity.As a consequence, the profiles of the tensor statistics simulated with the two LAD settings are similar.Nevertheless, the present study only demonstrates the forest effect modeled with the LADs shown Figure 6.If the LAD definition is changed, the forest effects could change and, thus, should be thoroughly examined.
As the forest drag force is not modeled, case 3 gives different prediction in comparison with cases 1 and 2, in particular, for u u /u 2 * displayed in Figure 13.Obvious differences are identified for all stress tensor components within the forest zone, and u u /u 2 * also shows inconsistency at vertical locations far from the forest zone.Therefore, the mismatch conveys that the modeled forest drag force play a critical role in predicting the statistics of flow fluctuations in the boundary layer, in addition to its importance for the mean flow statistics that is shown in Figure 12.As a summary, from Figures 12 and 13, we can see that both constant and vary LAD agrees well with the experiment data at the height of wind turbine.This implies the possibility to extend the current model with wind turbines models.One typical model for wind turbines is the actuator disk model(ADM) (Abkar & Porté-Agel, 2016;Strickland et al., 2022;Yang & Sotiropoulos, 2013); it simplifies the wind turbine as a disk to reduce the computational cost.The other existing model is named as actuator line model(ALM) (Ravensbergen et al., 2020).This model has more detailed information for the wind turbines and can perform results with higher accuracy.Interest readers can refer to Stevens et al. (2018), who compared the ALM and ADM to show the difference between these two models and the wind tunnel experiment.With the models mentioned above added to our present study, it would be interesting to study the possible placement or optimization of wind turbine distribution in a wind farm in the presence of forest terrain.

Conclusions
A GPU-based simulation platform using an LBM method for LES has been proven to be suitable for computing ABL flows with the forest presence, which are of interest in the site selection of wind turbine farms.The forest effects are modeled by defining a volume drag force in relation to the local velocity and LAD.The way of implementing the force for LBM is presented in detail.However, to refrain from influencing the GPU computational efficiency, the meshes are not locally refined in the method developed in this study.Uniform-sized lattices are generated in the whole computational domain.This leads to large y+ near the wall, which in principle requires a wall function.We introduce the approximations near the ground wall that the wall function is not taken into account, and that the forest effects are considered.
The near-wall approximations are made because the forest drag force overwhelms the wall friction.This is validated based on the qualitative and quantitative analysis of the results.The statistics of the mean flow velocity and stress tensor components are well consistent with the previous FVM-LES simulations and experiments.The mesh independence study illustrates that the modeled forest zone is resolved with 7 cells.
One advantage of the near-wall approximations is that the LBM program is compatible with the GPU architecture, so the GPU speedup is preserved.Moreover, the wall treatment is ignored to reduce the computational cost.The platform extends the capabilities of the open-source simulation tool GASCANS developed at the University of Manchester.
The forest effects are examined for a classic vertically varying LAD distribution, which was simulated in previous studies using the FVM-LES, and a constant distribution possessing the same total area as the varying one.The results between these two LADs differ only slightly.Nevertheless, it would be interesting to investigate the effects of the LAD vertical profile on the characteristics of ABL flows such as time-space correlations (Yao et al., 2008).In the present study, the LAD model is investigated with specific parameters such as a constant height, while the height can be changed in the reality.An interesting future work is thus to explore the range of the parameters that are suitable for the modeling.

Figure 1 .Figure 2 .
Figure 1.Illustration of GASCANS architecture and relationship between its components.The solid arrows indicate aggregation (i e. the classes at the beginning of the arrow are part of the class that the arrow points to) and the dashed arrows indicate primary communication.The isolated boxes represent definitions and utilities accessible by the core LB solver.

Figure 3 .
Figure 3. (a) The instantaneous non-dimensional streamwise velocity, u/u τ , of the turbulent channel flow at Re τ = 180.The bottom plane shown here is located at y + = 9.(b) Isosurfaces of the Q-criteron of Q = 0.0002/s 2 , which are colored with the streamwise velocity.

Figure 4 .
Figure 4.The comparison of the present LBM LES, Koda's LBM LES (Koda & Lien, 2014) and the FVM DNS (Moser et al., 1999).(a) The non-dimensional averaged streamwise velocity u + as a function of y + .(b-d) The non-dimensional RMS velocity components in the streamwise, wall normal and spanwise directions.

Figure 5 .
Figure 5.The computational domain of the ABL.The green color marks out the zone where a homogeneous forest with a height of h = 20 m is modeled above the ground.The freestream flow moves along the x-axis.

Figure 6 .
Figure 6.The distributions of LAD in two situations such as constant LAD of a f = 0.215, indicated by the dash line, and varying LAD with a f following Equation (15) with z m = 0.7h and a f m = 0.38, indicated by the solid curve.The green colors the area of a f = 0.215, which is equal to the area of the varying LAD case colored in red.

Figure 7 .
Figure 7.The normalized time-averaged streamwise velocity computed using the varying LAD with the different forest drag coefficient C D .

Figure 8 .
Figure 8. Snapshots of the normalized instantaneous velocity gradient near the upper boundary at different time.Here T represents the time period of the flow passing through the computational domain.

Figure 9 .
Figure 9.The averaged streamwise velocity obtained from the present simulations with the constant LAD and from the experiments by Bergström et al. (2013).

Figure 10 .
Figure10.The normalized Reynolds stress tensor components computed with the three meshes and reported in the experiments byBergström et al. (2013) Here the constant LAD is set in the forest modeling.

Figure 11 .
Figure 11.A snapshot of the instantaneous normalized streamwise velocity u/u * .(a) The simulation with the forest modeling of the varying LAD; (b) the simulation with no modeled forest.The bottom surface visualized here is located at y/h = 0.5, which is the middle of the forest zone.

Figure 12 .
Figure12.The averaged streamwise velocity in the present simulations with different forest modeling settings, as well as from the simulations using the FVM LES byNebenführ and Davidson (2014, 2015)  and the experiments byBergström et al. (2013).

Table 1 .
The simulation parameters for the turbulent channel flow.

Table 2 .
The simulated ABL cases in terms of the LAD.

Table 3 .
The meshes and respective computational costs in the mesh independence study.