Evaluation of an offshore wind farm computational fluid dynamics model against operational site data

Abstract Modelling wind turbine wake effects at a range of wind speeds and directions with actuator disk (AD) models can provide insight but also be challenging. With any model it is important to quantify the level of error, but this can also present a challenge when comparing a steady-state model to measurement data with scatter. This paper models wind flow in a wind farm at a range of wind speeds and directions using an AD implementation. The results from these models are compared to data collected from the actual farm being modelled. An extensive comparison is conducted, constituted from 35 cases where two turbulence models, the standard k-e and k-ω SST are evaluated. The steps taken in building the models as well as processes for comparing the AD computational fluid dynamics (CFD) results to real-world data using the regression models of ensemble bagging and Gaussian process are outlined. Turbine performance data and boundary conditions are determined using the site data. Modifications to an existing opensource AD code are shown so that the predetermined turbine performance can be implemented into the CFD model. Steady state solutions are obtained with the OpenFOAM CFD solver. Results are compared in terms of velocity deficit at the measurement locations. Using the standard k-e model, a mean absolute error for all cases together of roughly 8% can be achieved, but this error changes for different directions and methods of evaluating it.


Introduction
Offshore wind power is growing rapidly, not only within the UK and EU but is beginning to gain traction around the world in other markets including Taiwan (Yue et al., 2019), China (Poulsen and Hasager, 2017), (Yang et al., 2017) and the United States (Chipindula et al., 2018;Donovan et al., 2018;Musial, 2015;Wiseret al, 2015;Mills et al., 2018). To capture the maximum energy from an offshore wind farm, the most reliable assessment needs to be made of the potential wind farm before it is constructed. The models need to be able to assess the wind behaviour for the entire wind farm, rather than for a single turbine, to be able to get the most accurate picture of how the wakes interact within the wind farm and conduct further analysis such as structural modelling (Gentils et al., 2017;Rasheed et al., 2014;Wang et al., 2016bWang et al., , 2016aWang et al., , 2016cZhu et al., 2019).
A variety of methods exist and have been used in literature for modelling the wake of turbines within a wind farm for horizontal or vertical axis turbines. These methods include Navier-Stokes Computational Fluid Dynamics (CFD) solvers where the turbine is modelled as an Actuator Disk (AD) (Port� e-Agel et al., 2014)- (Richmond et al., 2018) to more computationally intensive models of fully resolved wind turbines (Wang et al., 2016b), (Stergiannis et al., 2016a), (Zahle et al., 2018), (Rodrigues and Lengsfeld, 2019). At the most computationally intensive end of the methods are fully resolved wind turbine. These can provide accurate answers and predict the flow realistically relative to less detailed methods but are limited to modelling only short time intervals, relative to the lifetime of the wind farm, of a few seconds due to their computationally intensive nature. There are a wide range of Navier-Stokes (NS) based approaches to choose from, actuator line is a common approach for representing the turbines which takes the idea of an actuator disk but applies it to a rotating line in a transient analysis (Port� e-Agel et al., 2014), (Mehta et al., 2014), (Kalvig et al., 2014). The simplest NS based models are porous disks or momentum sinks which can provide good estimates but lack detail in their assessments (Johnson, 2015), . AD is a low to medium fidelity approach where the turbine is modelled as a permeable disk whose loading influences the flow field (Annoni et al., 2014). The load can be evaluated either through looking up aerodynamic values for the blade or through solving Euler or Navier-Stokes equations over a discretised field (Wang, 2015). The advantage of AD is its relative simplicity and reduced computational cost due to not having to resolve the boundary layer around the blade (Port� e- Agel et al., 2014). However, due to the inviscid assumption of the model it is inaccurate in predicting downstream wake because it under-predicts the turbulence required for wake dissipation (Stergiannis et al., 2016a), (Annoni et al., 2014). For evaluating wind load variation within a wind farm, AD appears to be a good compromise method between engineering, or analytical, models and fully resolved rotors. The method is mature, at least 30 years old, with a variety of implementations. Researchers can either be interested in near wake or in the wake deficit and studies have shown that AD produces good agreement with measurements regarding wake deficit (Kalvig et al., 2014), (Johnson, 2015). The turbulence model used can have a significant impact on both the accuracy and stability of the solution and can be either steady-state or unsteady. Comparisons of these have been made by other researchers (Kalvig et al., 2014), (Cabez� on et al., 2011), (Castellani et al., 2013). Different levels of turbulence lengths scales might be modelled, for example many researchers have used Large Eddy Viscosity (LES) to model turbulence which models large turbulence length scales while averaging smaller length scales (Port� e-Agel et al., 2014), (Mehta et al., 2014), (Ivanell et al., 2007). However, a less computationally expensive option is to model the turbulence using Reynolds averaged Navier-Stokes (RANS) approaches based on eddy viscosity turbulence models including different implementations of k-ε , (R� ethor� e andSørensen, 2009), (R� ethor� eet al, 2007) and k-ω (Johnson, 2015), (Wang et al., 2016d) turbulence models. As these researchers have found, using the right turbulence model can have a large impact and it's not always clear from the start which model is a better choice.
Comparisons have been conducted in wind tunnel conditions (Kalvig et al., 2014), (R� ethor� e et al., 2011), (Stergiannis et al., 2016b), (Stevens et al., 2018) because those are easier to measure and compare to than full wind farms, however real wind farms have much more complex flow conditions. Some authors have modelled full wind farms but they've done so only at a very limited range of wind directions and speeds if there is any range at allfor example (Ivanell et al., 2007) was conducted at 7 directions but appears to have only been modelled at one far-field velocity while (Churchfieldet al, 2012) was only conducted at one direction. To accurately assess the wind resource, it is not enough to simply assess wind flow at one input condition but rather over a range of conditions. These conditions can include inflow velocity and direction, because even a small change in either of these can have large implications for the wind farm. Some researchers have conducted wind farm level research at multiple inflow directions (Javaheri and Cañadillas, 2013). However, when a turbulence model is assessed it is usually assessed only at a limited range of inputs and not over a broad range (Kalvig et al., 2014). Very few papers combine a range of directions necessary to evaluate a wind farm model and use wind turbine SCADA data, for example in (Creech et al., 2015) they did this but used engineering wake models and not CFD-AD models. The work presented here uses an NS based, giving the reader more knowledge of the performance of such models in this case, so that the reader can decide which type of model to use for their own purposes as well as better understanding what impact modelling choices make in this application.
The main contribution of this paper is that a CFD simulation of an entire wind farm is conducted at multiple free-stream directions and speeds which is then compared to SCADA and met-mast data held for that wind farm, thus evaluating the approach in a real case. These simulations are conducted for two different steady-state RANS turbulence models, allowing a direct comparison. An additional contribution of this paper is that it demonstrates a modification to an open source actuator disk code in OpenFOAM (Svenning, 2010) which expands the code's capabilities by incorporating individual turbine performance. Methods for evaluating the model against real site data, by approximating the measurement data with regression models, is presented. The work not only demonstrates best practice which has been shown in other research, but also builds on it.
This work expands on a previous conference paper (Richmond et al., 2018) with a strong focus on the CFD aspect. Specifically, much more is done to evaluate the results against the wind farm data, the construction of the model is much more detailed and twice as many CFD cases are used.
In section 2, the reference offshore wind farm and the reference data available are presented. In section 3 the CFD model is discussed in detail including the boundary conditions, meshing strategy and the turbulence models. Section 4 presents the actuator disk model used and the modifications to it, section 5 outlines the validation approach including modelling of SCADA data for comparison, section 6 presents the results in comparison to the real data from the wind farm and discusses the findings. Finally, a conclusion is given in section 7.

Reference offshore wind farm
In this work, an offshore wind farm containing 25 turbines is modelled using actuator disks in a CFD domain. The comparisons are made to data obtained from a meteorological mast (met-mast, MM). This mast is permanently positioned on site roughly between two turbines on an outer corner. The MM has sensors which can record wind speed and direction at five elevations up to a maximum of the hub height of the turbines. These data are recorded as averages in 10-min intervals. A map of the reference wind farm is shown in Fig. 1, turbine 16 is circled in red. The figure shows that the reference zero direction for the CFD model is normal to the turbine rows. This orientation puts the MM in the wake of two turbines at the reference zero, but also shows some interesting behaviour at the two extremes because the met mast starts being in the wake of the turbines next to it which causes a large velocity deficit which allows for investigation of the edge of the wake. There are 25 turbines in the wind farm, but Fig. 1 shows 30 to protect confidentiality of the company who provided data.
Although the orientation of the direction values was not given, it is possible to determine this out from comparison of the MM data to the map. The typical convention is wind coming from the north, going south, is zero and then the value goes up clockwise (Mittelmeier et al., 2016). However, that assumption doesn't fit the MM data. The correct orientation for this wind farm is rotated 25 � counter clockwise from a north zero. This type of sensor calibration issue is not uncommon (Catapult, 2018).

Governing equations and numerical discretisation
Within this section the CFD framework is detailed. The governing equations, physics models and numerical schemes are presented. A computational aerodynamic database is constructed within a parameterization process.
Within the CFD simulation framework the governing equations of fluid dynamics are numerically solved. The Navier-Stokes equations for incompressible fluids encompass the principles of continuum mechanics and conservation laws which, along with more details on the solver used, can be found in (OpenFOAMWiki, 2019) and are also given here: where u i is the cartesian velocity vector with components in three spatial dimensions u ! ¼ ½u 1; u 2; u 3; � T , x i the spatial coordinate direction vector, respectively; where ρ is the fluid's density, p the pressure and μ the dynamic viscosity.
As the flow of wind turbines is turbulent of high-Reynolds number, Reynolds averaging is applied thus the final equations solved are the Reynolds Averaged Navier-Stokes (RANS). Reynolds averaging decomposes the mean flow variables to mean and fluctuating terms, where temporal and spatial ensemble averaging ensures steady state solutions are obtained. To close the set of equations Boussinesq approximation of the eddy-viscosity hypothesis is set where turbulence modelling mathematically describes the eddy viscosity μ t .
The governing equations are formulated within the finite volume numerical framework and discretised in three space dimensions, where the SIMPLE algorithm is employed to obtain numerical solutions (Mangani and Bianchini, 2007), (Ambatipudi, 2006). This is a widely used solver algorithm which has been used by other researchers in modelling wind turbines, including in OpenFOAM's simpleFoam (OpenFOAMWiki, 2014) solver, which is for incompressible, steady flow without changes in temperature (Stergiannis et al., 2016a), (Stergiannis et al., 2017), (a Digraskar, 2010). The SIMPLE algorithm solves the governing equations in a sequential approach, iteratively until convergence is reached.
The governing equations are discretised in time and the time derivatives are set to zero; the gradient scheme is the standard Gaussian finite volume discretisation, where the scalar gradient is computed using Green-Gauss theorem (Federer, 1944) and the values at the cell faces are linearly interpolated from the cell centre values. The divergence schemes are the second order Gauss for the velocity field and bounded Gauss for k, ε and ω. More on the OpenFOAM numerical schemes can be found on the OpenFOAM user guide (Greenshields, 2018).
The convergence criteria were set to 10 À 8 for difference in turbulence kinetic energy, k, and each velocity component and was set to 10 À 7 for the difference in pressure. These convergence criteria were reached after around 5000 iterations, with some variation of � 1000 for all case set-ups. Tracking values for velocity and turbulence kinetic energy at the met-mast during solving, the values used were appropriate as the physical values converged in around as many iterations.

Turbulence models
Several published works compare turbulence models used with AD models for wind turbines wake models, for example some researchers have found that standard k-ε under predicts wind speed deficit (Kalvig et al., 2014), (Cabez� on et al., 2011). Some researchers, comparing k-ε and k-ω models recommend k-ε because the high level of turbulence in k-ω leads to quick wake recovery and hence less accurate predictions in mid and far wakes regions (Stergiannis et al., 2017). Although modified versions of the k-ε and k-ω models can be less robust, they generally perform better than the standard models (Cabez� on et al., 2011), (Stergiannis et al., 2016b). Therefore, to use the most suitable model for this work, a comparison of several RANS turbulence models was conducted. The three turbulence models compared are standard k-ε, Realizable k-ε and k-ω SST.

Standard k-ε turbulence model
The standard k-ε model is a widely used turbulence model developed by Launder et al. (Spalding and Launder, 1972). This model calculates a turbulence kinematic viscosity, ν t , based on the turbulence kinetic energy, k, and dissipation rate, ε. The standard transport equations are: where x i;j are Cartesian space coordinates, u i;j;k are mean stream-wise velocity in the denoted Cartesian coordinate. ν is the kinematic viscosity and C μ , C 1 , C 2 , σ k and σ ε are standard constants given in the original publication from Launder et al. (Spalding and Launder, 1972) these values have been used by other authors in modelling wind turbines (Kalvig et al., 2014), (Stergiannis et al., 2016b), (Stergiannis et al., 2017), however some authors have shown that different choices of constant values are more appropriate for atmospheric flow (El Kasmi and Masson, 2008). From which the turbulence kinematic viscosity can be calculated using:

Realizable k-ε turbulence model
The realizable eddy viscosity model was developed by Shih et al. (1995) and implements a realizable eddy viscosity formulation through modified dissipation rate equation. This modified dissipation rate equation is (Fluent inc, 2006): where Sε is the mean strain rate and ν is the kinematic viscosity. The difference being that the Reynolds stresses do not appear in this equation. The standard model constants are given in the original publication (Shih et al., 1995).

K-ω shear stress transport
The k-ω shear stress transport (SST) model was introduced by Mentor el al (Menter, 1994). This model uses k-ω from Wilcox (2008) to calculate the inner boundary layer flow and a k-ε model in the free-shear flow.
The k-ω equations for turbulent kinetic energy, k and specific rate of dissipation, ω are: where x j is Cartesian space coordinate, ν is kinematic viscosity, F 1 , is a blending function, P k is a production limiter, and σ k;ω;ω2 , β, and α are closure coefficients given in the original k-ω formulation from Wilcox (2008).

Model set-up
The velocity inlet uses a fully developed wind shear profile. The logarithmic profile was fitted to an averaged set of met mast wind velocity samples, which records velocities at different heights. Only samples for which the met mast was in the free stream, and not in the wake of the wind farm, were used. There are some potential limitations to this, for example that this is the opposite to the direction at which the CFD cases used to build training data were conducted at; however, it gives the most accurate assessment of the local wind shear profile possible. Fig. 2 gives an example of what the samples are like, each dot represents an individual value and the five lines each link points from the same sample for the sake of representation. The velocity at 58 m was limited to 7.5 m/s � 0.1 m/s in this plot.
The velocity profile used is of the form in (9) which comes from the OpenFOAM API guide (OpenFOAM).
where K is the von Karmen constant, z g is the ground surface height, z 0 is the roughness height and u * is given by (10). From this study, it was found that the most suitable value for roughness height for this location was z 0 ¼ 0.015. Although since waves are predominantly wind driven, z 0 would change for a given wind speed.
The atmospheric boundary layer model used is the standard atmospheric boundary layer in OpenFOAM which is the model presented by Richards et al. (Richards and Hoxey, 1993). In all of turbulence models a turbulence kinetic energy value of 1.3 was used. Wall functions were used for the sea surface using the inlet values. The top wall uses a slip condition; however it is 2 km above the turbine hubs so should have no noticeable impact on the wind turbines.

Meshing strategy
The grid used is a structured grid with over 63 million cells. To construct such a large grid, OpenFOAM's blockMesh utility is used on an academic high-performance computer cluster which has 118 compute nodes, each node having 16 CPU cores with a base frequency of 2.10 GHz and 128 GB of shared memory. The domain is much larger than the area of the wind farm to avoid the influence of the walls on the turbines; the domain is 16 km wide, 6.8 km long and 2 km tall. The grid is refined in such a way to have smaller cells around where the turbines are located and larger cells further away from the turbines.
Despite being such a large grid with over 63 million cells, there are roughly 12 cells across the diameter of the turbines. This is a consequence of having a structured grid with such a large domain. The advantage of a structured mesh however is that it ensures a high quality mesh.
A grid sensitivity study is conducted to determine suitable cell sizes with a base set of values used being taken from a paper on validating OpenFOAM's actuator disk model (Javaheri and Cañadillas, 2013), though a more fine, structured grid is used in the end. The domain used for the work is shown in Fig. 3 the grid used is demonstrated in Fig. 4.

Original actuator disk code
The simpleFoam code is modified to implement an actuator disk model, with tangential force momentum source, to represent the turbines. The code used for the study is based upon Svenning's actuator disk model (Svenning, 2010) which represents the turbine not as an infinitely thin disk but as a three dimensional region where cell centres within the region are assigned new momentum terms. This gives more flexibility in changing the direction of the turbines.

Modifications to actuator disk code
To implement turbine performance it is necessary to determine thrust and torque curves for a given wind speed, such as the curves which can be found for the NREL 5-MW turbine as shown in (Jonkman et al., 2009). This was done by first filtering out data not representative of general turbine performance, e.g. speed below cut-in and pitch higher than usual values. Then generator power was converted to rotor power through an efficiency curve. Rotor speed was used to convert rotor power to rotor torque and finally velocity at the disk was used to convert rotor power to rotor thrust. This is resulted in clear performance curves which are very similar to the NREL curves in general shape. Scatter plots of the turbine performance are shown in Fig. 5.
To automate selection performance for each turbine, that is the perpendicular and tangential components of velocity, this performance needed to be modelled and the curve-fitted approximation model needed to be integrated into the AD code. Polynomial functions were then fitted to this data using the method of least squares, two polynomials for each value with the change occurring at rated power. From cut-in speed to 10.85 m/s a fourth order polynomial was used for both thrust and torque, after 10.85 m/s, a third order polynomial was used for thrust and a straight line for torque. These shapes were chosen after comparison to other shapes, they fit this particular data set the best. The use of a polynomial model was chosen because it fit the data very well, as can be seen in Fig. 5, and also it could easily and efficiently be These performance curves were based on inflow velocity, however this isn't what is observed by the turbines within the wind farm, so instead it was necessary to determine a theoretical upstream velocity based on the velocity at the disk and AD theory. The theoretical upstream velocity was used instead by the code to determine the thrust and torque values for each given turbine. Fundamental AD equations were used for this which come from (Burton et al., 2011). These equations are: where U D is the velocity at the disk, in the CFD code this is velocity averaged over the disk volume and in the SCADA data this is velocity measured by the anemometer on the nacelle, this is a source of some minor discrepancy.U ∞ is the far-field velocity, C T is the coefficient of thrust, A D is the area of the disk and a is the axial induction factor.  Equations (11), (12) and (13) were used to derive the following equation for axial induction factor in terms of thrust and velocity at the disk: where: This equation is solved iteratively within the AD code, for each turbine at each solver iteration, as thrust is dependent on axial induction. Finally, the original code used two point values (x,y,z) to define the 'start' and 'end' points which define the yaw of the turbine; all turbines set to a single direction. The same direction assumption is not necessarily true and the difference in mean direction between two turbines is an average of around 9% for this SCADA data, however this assumption was made for simplification. The process followed by the code is shown in Fig. 6.

Validation approach
Samples in the MM data occur at random input conditions. To overcome this, samples close to the CFD input conditions were taken, put into bins and averaged. The bins are made as small as possible to only take representative samples but still be able to determine an average, their size were chosen in order to have between 8 and 10 data points in each. Additionally, there are very few samples around 15 m/s inflow velocity but a very large amount around 7.5 m/s. Even with thousands of samples, some of the limits had to be large to ensure there were enough samples in each bin. Different limits were chosen for each inflow velocity with the aim of having good representation for each point, between roughly 3 to 10 samples each. The bin sizes as well as the MAE for each inflow velocity is presented in Table 1.
The values compared for the CFD are Root Mean Squared Error (RMSE) and Mean Absolute Error both as a value and a percentage of the value being predicted and are calculated using standard equations. The error is determined in comparing the CFD results to the real data from the wind farm.
With so much scatter in the met-mast and SCADA data, it is difficult to discern the general behaviour in the data from a scatter or other types of plots.
Although putting the data into averaged bins is a commonly used and straight forward approach, the resulting bins can be relatively large in some places, as shown previously in Table 1. Additionally, the observations within each bin are not weighted, therefore observations far from the bin centre have as much impact on the average as those far away. Regression models have the potential to be more representative of the data. Therefore, to determine the behaviour of the data as well as for comparison to the CFD values, regression models were applied to the SCADA/MM data. There are two independent variables, inflow velocity and inflow direction, which were used to predict the one independent variable, met-mast velocity. After restricting the direction to � 35 � from the zero-reference direction, data set was reduced to 5046 samples.
For the regression modelling, the 'regression learner' app in Matlab's 'Statistics and machine learning toolbox' was used which has a wide selection of regression methods. All of the models available in the app were tested and the ones which fit the data best, based on their RMSE   values, were Gaussian Process Regression using an exponential kernel function and an ensemble of trees using bagged trees with a minimum leaf size of 25 and 2000 learners. Gaussian process regression employs a Bayesian approach where a prior, which is a Gaussian distribution based on a 'noise' parameter for each point, is combined with observations to form a posterior (Do and Lee, 2008), (Davis, 2006), (Rasmussen and Williams, 2006). In Gaussian process regression the prediction is given by (Pandit and Infield, 2019): Pðf jXÞeNðf j0; KðX; XÞÞ (16) where f|X is the mean function and K(X,X) is the covariance matrix. K(X, X) is determined using the exponential kernel which defines it as (MathWorks, 2019a): where σ f and σ l are signal standard deviation and characteristic length scale respectively. The Euclidian distance between points x i and x j is defined as: r ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi À An ensemble of trees approach combines the predictions of multiple decision tree models, with their segregation of the variable space randomly determined. The use of multiple trees can reduce the level of overfitting to the data compared to a single tree (Breiman, 2001), (Kam Ho). For a regression problem, the prediction value is the weighted average using selected trees (MathWorks, 2019b).
where α t is the weight of the tree, b y t is the prediction from tree t, and S is the set of trees used in the prediction.
The goodness of fit parameters for these two models in predicting the data are given in Table 2. These values are limited by the scatter in the data. These two independent variables can predict a significant amount of the variation in the results, but do not completely account for the variation. However, for determining the general trend in the data and in comparison to the CFD results, which used the same independent variables, the results from these models are suitable based on their goodnessof-fit parameters.
For calculating the comparison values of RMSE and MAE, only the results from the ensemble method are used, not the GPR method. This is because ensemble is a simpler but more robust method, as the ensemble method takes averages from within multiple partitions, the result is more constrained to where the data points are, rather than GPR which is also somewhat constrained to forming a continuous function, which is not necessarily representative of the data. However, GPR is used for representation as it provides a smooth function which shows the behaviour well.

Grid sensitivity study
The discretisation error of the spatial numerical scheme can be assessed and evaluated with a grid sensitivity analysis. The grid sensitivity analysis entails the consistent generation of consecutively refined grids, at least three, simulations with the same numerical setup and evaluation of representative quantities (ASME, 2009). Four grids are constructed, the flow conditions are set as follows: velocity magnitude measured at 80 m above sea-level was used as a comparative value since much of the future comparisons are made from this. The simulations are performed at 0 � inflow direction at 10 m/s. Grids specifications are shown in Table 3 and solver times are using 2 nodes, with 16 cores each on the high-performance computer described in section 3.4.
For comparison, the velocity at the met-mast at 80 m was used. The results from this are shown in Fig. 7.
The grids were increased by increasing the number of cells along all directions by a certain, consistent amount and resulted in a grid refinement ratio of around 2 ( �0.1). Given that a second order scheme is used, the Richardson extrapolation can be used to determine an estimate for the continuum value (Roache, 1998).
where f h¼0 is the estimated value, f 1 and f 2 are the results from the first and second finest grids respectively, r is the refinement ratio and p is the model order.
Using the results obtained gives an estimated continuum value of 8.70 m/s, this is 0.5% higher than the value in grid 4 and 2.2% higher than the value for grid 3. The grid 3 level of refinement was used for the studies, this is due to resource constraints, given the number of cases to run it wasn't possible to use the highest refinement.

Turbulence model sensitivity study
The initial turbulence comparison was done at 0 � free stream velocity (normal to the rows along which the turbines are aligned) and at 15 � clockwise from this. The data compared is the velocity at the met mast.
The results from this comparison are in line with literature which states that the k-ε model is more accurate for this case and that the k-ω SST model over predicts the intensity of the wake (Kalvig et al., 2014). The results from this comparison are shown in Fig. 8 along with predictions from the two regression models which were trained on the SCADA data. As can be seen in the figures, the k-ω SST model in this case consistently shows much lower values than is observed by the real met-mast, while k-ε generally performs more in line with the real data, as is consistent with the findings of other researchers (Stergiannis et al., 2017). At higher velocities, around 15 m/s, and at 0 � , all models predict velocities that are too low. This under prediction by the models is partially due to having so few data points at higher velocities for comparison, and so these points come from slightly further away from the centre, where the wake is less prominent.
It is still possible that k-ω as implemented performs better than k-ε for some combinations of initial conditions, as is observed in both 0 and 15 � for 5 m/s, therefore the 35 cases are conducted with both a k-ω model and a k-ε model. The k-ε model chosen for this is standard k-ε. This is because standard k-ε is shown to predict closer to the MM data.

Case set-up
A wide range of cases were conducted to evaluate the model over a  Table 3 Grid statistics and solver time for grid sensitivity study including for each grid the number of cells, points and time taken for the solver to finish. spectrum of conditions. The wind farm is modelled at a range of incoming wind angles between 35 � and À 35 � and the freestream velocity is modelled between 5 m/s and 15 m/s. All combinations of direction including 0 � as well as plus and minus 10, 20 and 35 � are modelled with freestream velocities of 5, 7.5, 10, 12.5 and 15 m/s, resulting in 35 cases. The limit of incoming wind direction is chosen to capture a board range of inflow directions while still capturing a relatively high resolution in terms of angular variation.

Comparison to site data approximation
The numerical solutions are compared with real data to identify potential discrepancies and evaluate the levels of computational uncertainty. Therefore, data was extracted along a vertical line where the met mast is located to make predictions of local wind velocities.
Once the extracted data are generated from the CFD model, they are compared to the wind farm data samples to determine their accuracy. From each of the 35 CFD cases, local velocity values at the met-mast were sampled at hub height, the same as one of the sensors on the met-mast. The results from this comparison are shown in Table 4. The mean absolute error for the k-ε is 0.761 m/s, approximately 8% relative to the comparison values and k-ω SST is 0.932 m/s approximately 10.5%, using the binning approach. This is roughly in line with the error values achieved in CFD-AD by other authors comparing to wind tunnel measurements (Kalvig et al., 2014), (Stergiannis et al., 2016b) as well as for wind far measurements (Javaheri and Cañadillas, 2013).
The results from the CFD analysis are compared to the regression model and binning approaches for each individual data point in Fig. 9, which is a prediction against true plot where a 45-degree line indicates zero error between the two values. For both the binning and ensemble approaches there is good agreement to the CFD results, however the values are generally closer for the binning approach. This increased accuracy of the binning approach is partially due to an averaging of the result; the ensemble approach can show sharp fluctuations as the data fluctuates which would be averaged out in the binning approach. This is can be a desirable behaviour to some extent, but if there is too much averaging than it ceases to capture the behaviour of the real situation. Fig. 9 also shows larger deviations from the zero-error line in both approaches at higher wind speeds. This is at least partially due to a lower amount of data present at these higher values and so both approaches become less reliable. Additionally, this could be due to higher unsteadiness at higher velocities, if standard deviation values were present in the SCADA and met-mast data, this could be investigated. Fig. 10 shows the CFD predictions compared to the measured data from the SCADA/MM for 7.5 m/s free-stream condition. This figure shows the level of scatter present in the measured data and necessitates the use of the regression models. This also shows that a significant portion of the difference between CFD prediction and data comes from the difficulty in getting a reliable representation of the measurement data. As can be seen from other researchers (Javaheri and Cañadillas, 2013), (Avila et al., 2017) measured data is typically not as smooth and determinate as steady-state CFD results. Fig. 11 shows the results for the CFD model and the regression models, normalised by inflow velocity and averaged for inflow direction. At 0 � the MM is in the direct wake of two turbines, as seen in Fig. 12, which causes a minor dip in the wind flow velocity; this dip is much higher for the k-ω SST model than for the k-ε model. There are numerous things which can affect turbulence kinetic energy including calibration of constants, initialisation, boundary conditions and so on. In this case, the k-ω SST model predicts lower turbulence viscosity than the k-ε, and researchers have found that a lower turbulence viscosity results in less wake recovery (Stergiannis et al., 2016b).   Towards À 35-degree inflow direction there is a significant drop in the MM velocity shown by the regression models which is captured more closely by this k-ω SST turbulence model. The results from this direction are shown in Fig. 13. Although the met mast is roughly equidistant between the nearest two turbines, this significant drop is not observed in the þ35-degree direction. The drop at À 35 � appears to be an interaction of both the near wind turbine and two upstream turbines.
A typical result from the simulation in shown in Figs. 12 and 13 which show a two-dimensional cross section of the flow field through the turbine centres. This image shows the velocity magnitude field. The case is set up as described in the methodology section and the inputs for this particular case are inflow velocity of 10 m/s and an inflow direction of 0 � (normal to the three rows). Several observations can be made from this plot. This plot shows how, at this direction, the met-mast is in the wake of two turbines, and hence there is a velocity reduction at the MM. Another observation is that there is more of a velocity deficit on the right of the wake, when viewed in the direction of wind flow, than on the left. This is due to how the turbines interact with the wakes from upstream turbines. The third observation is that there appears to be a global reduction in velocityeven where there is no immediate wake, the velocity is nearly one m/s lower on the left side of the figure compared to the right of the figure.
The turbulence viscosity for k-ε, where the freestream direction is 35 � with a free-stream velocity of 10 m/s, is shown in Fig. 14. Throughout the paper, the k-ε model is mainly assessed due to it performing better for this model set up. This shows that the air downstream of the wind farm is still turbulent for a large distance. Additionally, the turbulence is not so high until after the third row. This figure shows that the wake of the turbine is much broader than it appears in the velocity plot. Just by looking at velocity it might appear there are times when the MM is not in a turbine wake, but from the turbulence plot one would conclude that this is never the case for the directions modelled. Table 4 shows the metrics for comparison of the two turbulence models against the binning and regression approaches. This shows that for conducting all cases k-ε outperforms k-ω for how these models were set up, however the difference is not as significant as it might be for a more limited range of inflow directions.

Conclusions
In this study a computational fluid dynamics analysis of an existing wind farm is conducted using the actuator disk method. Not only are the cases conducted at multiple wind directions and velocities, but the results are also compared to real recorded data from the wind farm. The opensource code OpenFOAM is used and modifications to the code are presented so that the process followed is reproducible. Additionally, methods for processing the wind farm data are presented.
The two turbulence models of standard k-ε and k-ω SST were compared and used for all simulations. It was found that both perform (a) (b) Fig. 9. Approximation method to measured data against prediction plots comparing the MM velocity values for CFD results to (a) ensemble regression (b) binning.  generally well but for this model setup k-ε performs better, with a mean absolute error of 8% for all cases compared to 10% for k-ω SST when binning is used to compare to the met-mast data. In the case where the met-mast is in the direct wake of two turbines up stream, k-ε shows good agreement with the recorded data. However, at 35 � , when the met-mast is on the edge of a very close turbine, k-ε underpredicts wake deficit. Therefore, an overall error value calculated for one direction only quantifies the error for that direction and not others. New methods for the use of SCADA data in both the construction and evaluation of the wind farm models are shown. Two approaches of approximating the SCADA data to more readily compare it to the CFD predictions are shown and compared. These approaches are binning the SCADA data and constructing regression models from the data. To compare the deterministic CFD results to the scattered measurement data, regression methods are used to approximate the MM data. The methods chosen are Gaussian Process Regression and Ensemble bagging, with mean absolute errors of 0.74 and 0.76 respectively. Both provide a suitable means of comparison however it is not determined which one is better.
The current practices for actuator disk method are well suited to modelling wake deficit for an entire wind farm at a range of inflow conditions. However, some care must be taken when inflow direction is  In terms of potential future work, with the growth in computational capabilities, CFD can more be applied in a way that expands the compromise between capturing flow complexity and computational time. By also combining machine learning approaches, using the CFD results as training data, this opens the possibility to use CFD for evaluating the wind farm in ways that were previously not possible.

Declaration of competing interest
No other conflict of interest is stated.