Looking for an Offshore Low-Level Jet Champion among Recent Reanalyses: A Tight Race over the Baltic Sea

With an increasing interest in offshore wind energy, focus has been directed towards large semi-enclosed basins such as the Baltic Sea as potential sites to set up wind turbines. The meteorology of this inland sea in particular is strongly affected by the surrounding land, creating mesoscale conditions that are important to take into consideration when planning for new wind farms. This paper presents a comparison between data from four state-of-the-art reanalyses (MERRA2, ERA5, UERRA, NEWA) and observations from LiDAR. The comparison is made for four sites in the Baltic Sea with wind profiles up to 300 m. The findings provide insight into the accuracy of reanalyses for wind resource assessment. In general, the reanalyses underestimate the average wind speed. The average shear is too low in NEWA, while ERA5 and UERRA predominantly overestimate the shear. MERRA2 suffers from insufficient vertical resolution, which limits its usefulness in evaluating the wind profile. It is also shown that low-level jets, a very frequent mesoscale phenomenon in the Baltic Sea during late spring, can appear in a wide range of wind speeds. The observed frequency of low-level jets is best captured by UERRA. In terms of general wind characteristics, ERA5, UERRA, and NEWA are similar, and the best choice depends on the application.


Introduction
During the last few decades, the growing demand for renewable energy has gone hand-in-hand with a rapid increase of wind power production. Some countries focus on investigating the possibilities for land-based wind turbines; others are more interested in offshore wind power [1,2]. While one option does not exclude the other, it is generally true that the wind speed is both higher and more stationary over water than over land; see Figure 1a. On the other hand, the cost of grid connection is significantly higher offshore than onshore [3].
The size of offshore wind turbines has grown during the last several decades, progressing from a hub height of 35 m and a capacity of 0.45 MW in 1991 to a 113 m height and 8 MW capacity in 2016 [4]. Today, the specifications for the new 15 MW offshore reference wind turbine describe a hub height of 150 m with blades sweeping heights from 30 to 270 m [5].
Large semi-enclosed basins such as the Baltic Sea and the Mediterranean Sea are in many ways ideal for establishing wind power parks [6]. The relatively short distance from the coast everywhere simplifies the infrastructure and lowers the cost for grid connection. However, there are many other aspects to consider when planning for a new offshore wind power site: sea depth, animal life, shipping and aerial routes, noise and light disturbance, and military restricted areas, to mention just a few (e.g., [7][8][9]). To date, the potential for wind power production in the Baltic Sea has mainly been utilized by Germany and Denmark, with more than 100 and 300 turbines, respectively, all located in the southwestern parts of the basin [10]. Sweden has a total of 81 turbines, and among the other countries around the Baltic Sea, it is only Finland that has an offshore wind park (10 turbines). Apart from pure offshore wind farms, there are many wind turbines that are placed on land in the coastal zone and, depending on wind direction, can also be considered to be located in marine conditions [11].
From a meteorological perspective, the proximity to the coast for large semi-enclosed basins creates special mesoscale conditions, which can cause deviations from a normally assumed logarithmic or power-law wind profile [12]. To a greater or lesser extent, mesoscale phenomena such as sea breezes, low-level jets (LLJs), boundary layer rolls, and air-sea energy transfer all influence the wind conditions (e.g., [13][14][15]). In order to use reanalysis data or other similar products for wind power investigations over large inland seas, it is important to investigate to what extent the datasets represent mesoscale phenomena.
In this paper, we focus on the characteristics of the LLJ, a phenomenon that has been extensively studied before, over the Baltic Sea (e.g., [14,16,17]) and elsewhere, in places as diverse as the North Sea [18], the Southeast Pacific Ocean [19], and the Weddell Sea [20]. Compared to a standard wind profile, conditions with an LLJ are qualitatively different regarding both wind speed and turbulence, affecting the performance and loads on the turbine, as well as the wake recovery rates [21,22]. LLJs can form in many different ways. While an LLJ is an inherent feature in the stationary stable boundary layer, most LLJs observed are present due to transitions in external forcing. The formation of the nocturnal onshore jet is well studied [23][24][25]. When the stability increases during the evening and night, the lowest part of the boundary layer experiences frictional decoupling, and the wind speed can increase, creating a local wind maximum. The process of frictional decoupling, often referred to as an inertial oscillation, can also happen at the coast, when air from land is advected over a relatively cold sea (typically during late spring and summer) [14,18]. The sea breeze circulation can also create favorable conditions for an LLJ to appear [26]. The so-called coastal jets are LLJs that are parallel to the coast, and can occur when there is a sharp temperature gradient between the land and sea surfaces [14,27,28]. Coastal jets can also form along ice edges [29].
In this paper, the wind conditions over the Baltic Sea are examined at heights relevant for wind power (hub height, as well as the heights swept by the blades), and low-level jets are studied in detail. Since measurements of wind speed at hub height are rare (especially offshore), gridded computer models are needed to describe the wind resource and assist in the decision about where to build a wind park. A reanalysis is an optimized gridded description of the atmosphere at a given time, and by comparing with observations, it is possible to study the performance of the reanalysis under different meteorological conditions. We analyze four commonly used and freely available state-of-the-art reanalyses and compare with LiDAR (light detection and ranging) measurements up to a 300 m height from four locations spread over the Baltic Sea. The reanalyses used within this study handle the physics differently, which in turn affects the resulting quantities (e.g., wind speed, low-level jets, and shear) to varying extents. We partly discuss these differences in the physical models where applicable; however, the aim of the paper is not to describe each reanalysis in detail, but to highlight the essential challenges when comparing observations with the selected reanalyses.
The paper consists of two parts. First, we study how well the reanalyses describe the general wind conditions at the sites in terms of average wind profiles, wind speed distributions, correlations, and average wind shear distributions over the rotor. In the second part of the paper, low-level jets are highlighted with a comprehensive study on how well the reanalyses capture the annual and diurnal cycle, as well as the height and speed of the jets.

Observations
The observational data used in this study came from wind-profiler LiDAR measurements, which have an advantage over those from traditional meteorological masts, in that they can reach higher, typically up to 300 m for wind energy purposes. The Doppler-wind LiDAR sends out a laser beam, which is reflected from moving particles in the air. By aiming the laser at an angle from the vertical (typically around 30 • ) in at least three different directions and recording the frequency of the backscattered signal, it is possible to calculate the wind speed using the Doppler shift. The emitted laser beam can be either continuous or pulsed. A continuous-wave LiDAR adjusts the focus of the laser beam to a specific height, thereby increasing the probability that backscatter on particles comes from that particular height. This has the consequence that the measurement volume becomes a strong function of the measurement distance. A short distance enables a narrow focus, whereas a long distance prevents a narrow focus [30]. A pulsed LiDAR on the other hand sends a short laser pulse, determining measurement distance by the time between the emitted pulse and the received backscatter. In this way, it retains the same vertical measurement volume for all levels, resulting in a smaller uncertainty at higher levels than a continuous-wave LiDAR, but a larger uncertainty at lower heights [31].
For reference, approximate vertical extents of measurement volumes (expected to contain 2/3 of the backscatter) are ±4 m at a 50 m height, ±15 m at a 100 m height, and ±150 m at a 300 m height for a continuous-wave LiDAR, while it is approximately ±15 m at all heights for pulsed LiDAR [30].
Four sites with recent LiDAR measurements covering at least one year were used as observational data; see Figure 1a for the locations. The different locations and their setup are described in brief below. The time periods for which data were available from the different sites are shown in Figure 1b, and the measurement heights in Figure 1c.

Anholt
The measurements from the Anholt site were performed with a pulsed LiDAR (Leosphere Windcube v2) during 2013 and 2014 on a platform just west of the Anholt wind farm. It is a pure offshore station with at least 20 km to the nearest land surface, but the distance between the LiDAR and the closest wind turbine was only approximately 1.5 km. The hub height of the wind turbines is 81.6 m. Data for wind speed was recorded as 10 min averages, covering a period of two years, and the measurements were at ten heights: 65,85,101,105,125,141,185,225,275, and 315 m above sea level. In the previous work by [32], the Anholt LiDAR data were used to study the effects of environmental conditions on wind turbine performance.

FINO2
The FINO2 meteorological tower was deployed on the southern edge of the reef Kriegers Flak in 2007, and during one year, from July 2012 to July 2013, the tower was accompanied by a pulsed Leosphere Windcube v2 LiDAR placed on the same platform. The main reason for the measurement campaign was to perform a quality control of the wind speed measured by the anemometers mounted on the mast [33]. LiDAR measurements were recorded at ten levels (62,72,82,92,102,120,140,160,200,240, and 280 m above sea level), and wind speed data were stored as 10 min averages. The distance to the closest land surface is at least 35 km in any direction.

Östergarnsholm
The Östergarnsholm measurement station is a meteorological research site located at the southern tip of a small (approximately 2 km 2 ), flat island 4 km east of Gotland (e.g., [34,35]). The site is part of the ICOS (Integrated Carbon Observatory System) and includes, among other things, a 30-m-high land-based tower with high-frequency measurements and a continuous-wave scanning LiDAR, Z300 ZX LiDARs, modified to measure up to 300 m and store raw data [36]. The LiDAR was deployed in December 2016 and measured at heights of 28,39,50,100,150,200,250, and 300 m above sea level. In this paper, data up to and including December 2018 were used.

Utö
Utö lies at the southern edge of the Finnish archipelago in the Baltic Sea, 60 km southwest of the mainland. Utö is a small island with an area of approximately 1 km 2 and the highest point is <20 m above the sea level. The nearest islands are of a similar size and are approximately 10 km away [37]. The data used in this paper were from the time period February 2015 to December 2018. Depending on the winter conditions, the sea around Utö can freeze. During the measurement period, sea ice cover in the Baltic Sea extended to Utö only during parts of February, March, and early April 2018. Utö is part of the Finnish ground-based remote-sensing network, and during the measurement period, the island hosted a number of measurements in addition to the Doppler LiDAR utilized here [38]. A Halo Photonics Stream Line pulsed Doppler LiDAR [39] was located at 8 m above sea level at Utö, and a 15 • elevation angle velocity azimuth display (VAD) scan was configured with 24 azimuthal directions every 15 min. The radial resolution of the measurement was 30 m, and the integration time per beam was 7 s. Raw data were post-processed according to [40], and a signal-to-noise ratio threshold of −23 dB was applied to the radial measurements before wind retrieval from the VAD. The three first range gates were excluded from the analysis, and thus, horizontal winds were available from 35 m above sea level at a 7.8 m height resolution.

Reanalyses
One general problem that lies in the nature of gridded data (e.g., reanalyses) when comparing with observations is that, while the observations are usually point measurements, a reanalysis describes the average conditions in the grid box [41]. This also implies that reanalyses are better at capturing average values than extremes. The ability to simulate rapid variations in space is limited by the horizontal resolution of the model, and thus, coastal areas can be problematic. The lack of information about the uncertainty in most reanalyses is one of their biggest weaknesses [42]. Thus, research that assesses the uncertainty of reanalysis datasets by means of comparison with observations that are not incorporated in the assimilation process is generally a valuable contribution to the scientific community.
Using data from four recent reanalyses, the wind profile from the grid point closest to each observation location was compared with the LiDAR measurements from that site. The reanalyses were selected based on their use within the wind power community and that the data should be publicly accessible and free of charge. The reanalyses are described in brief below, and the features that were most relevant for the analysis in this paper are summarized in Table 1. The height levels from the reanalyses used in this paper are shown in Figure 1c. For more information about the specific data files used, see the section on Data Availability. The Second Modern-Era Retrospective analysis for Research and Applications (MERRA2) is a global atmospheric reanalysis from NASA released in 2015 [43]. The resolution is 0.625 • in the longitudinal direction and 0.5 • in the latitudinal, corresponding to a horizontal resolution of approximately 40 km × 55 km over the Baltic Sea area. Data are available from 1980 onwards and make use of the latest instrumentation on the satellites, which the former version of MERRA could not. Data are assimilated using the Goddard Earth Observing System Model Version 5 (GEOS-5) and the Global Statistical Interpolation (GSI) scheme, and the model has 72 levels in the vertical [44,45].
The surface roughness over the ocean is calculated using a polynomial combining the algorithms suggested in [46] and [47], adjusted with more recent observations. Sea surface temperature (SST) is taken from the OSTIA dataset [48]. The surface layer turbulence is parameterized using a scheme by [49] based on Monin-Obukhov similarity theory in which effects from heat and moisture transport in the viscous sublayer over the ocean are included. Above the surface layer, the turbulence is parameterized using a combination of the Richardson-number-based scheme by [50] and the non-local scheme by [51]. For details on the MERRA2 surface roughness and turbulence parameterizations, we refer the reader to [45].
Instantaneous values of the wind components u and v from the four lowest model levels were downloaded together with temperature, specific humidity, and surface pressure to allow for calculation of the height of the model levels. Furthermore, the wind components at a 10 m height were downloaded. Data are provided in time steps of three hours.

ERA5
The latest version of the global reanalysis from the European Centre for Medium-Range Weather Forecasts (ECMWF) called ERA5 (ECMWF Reanalysis 5 th Generation) uses a 10-member ensemble 4D-Var data assimilation with 12 h windows and ECMWF's Integrated Forecasting System (IFS). The horizontal resolution is approximately 17 km × 31 km over the Baltic Sea, and the model runs with 137 hybrid sigma model levels from the surface to the top of the atmosphere. Data output is hourly and consists of both analyses (at 06 and 18 UTC) and short forecasts in between. The dataset is known to have a mismatch in the wind speeds at times when the data assimilation cycle changes (at 09 and 21 UTC) [52]. At the moment, data are available from 1979 to present, but when complete, ERA5 will have a record starting in 1950 [53].
Surface roughness in ERA5 depends on the vegetation type and snow cover. Over the ocean, the roughness is a function of the waves, calculated using the ECMWF Wave Model [54,55]. SST is assimilated using a combination of the IFS and the NEMO model [56]. In the surface layer, the scheme for turbulent diffusion is based on a first-order K-diffusion closure of the Monin-Obukhov similarity theory. K-diffusion closure is also applied above the surface layer, except for unstable conditions when the scheme is switched to an eddy-diffusivity mass flux framework. See [54] for more details on the physical parameterizations in ERA5.
In order to get the wind profile in the lowest part of the atmosphere, the wind components u and v were downloaded for the 15 lowest model levels. For MERRA2, this was accompanied by data for surface pressure, temperature, and specific humidity to enable calculation of the height of the model levels above sea level.

UERRA
The European reanalysis UERRA (Uncertainties in Ensembles of Regional ReAnalyses) was created in a European FP7-project (the European Commission 7 th Framework program) under the leadership of SMHI and uses the HARMONIE 3D-Var data assimilation system [41]. Data from ERA-Interim (the predecessor of ERA5) [57] are used along the lateral borders, and the resolution is 11 km × 11 km. For the years before 1979, data from ERA40 (the predecessor of ERA-Interim) [58] are used on the borders. The time series starts in 1961 [59].
The UERRA-HARMONIE system uses 65 terrain-following sigma coordinates in the calculations, and data are available at 11 height levels from 15 to 500 m. Analyses at 00, 06, 12, and 18 UTC were combined with short forecasts (1-5 h) in order to get hourly data from UERRA. However, these short forecasts suffer from spin-up issues, primarily for the first two hours. In strong turbulent conditions, this has been known to render wind speeds which are too high [41]. For an overview of the UERRA forecast skill over land, see [60].
For surface fluxes and variables, UERRA uses the SURFEX platform [61]. Over land, the surface roughness is calculated from the topography, the snow depth, the fraction of vegetation, and the Leaf Area Index. Over the ocean, iterative bulk parameterizations [62,63] based on the results from several flux measurement campaigns are used. The parameterizations are optimized to cover a wide range of sea states. Over lakes, the Charnock parameterization [64] is used to calculate the surface roughness [65]. A combination of ERA5 and NEMO is used for the SST. The turbulent transport in the boundary layer is described with a prognostic diffusion scheme for turbulent kinetic energy [66] and a scheme for shallow convection [59,67].

NEWA
The New European Wind Atlas (NEWA) was created to quantitatively assess the wind conditions for wind energy production in Europe, both onshore and offshore [68,69].
A modified WRF (Weather Research and Forecasting) model, Version 3.8.1 [70], was used to perform mesoscale simulations in a nested grid with 3 km × 3 km resolution for the inner domain. The WRF model was set up for 8 day forecasts (including 1 day spin-up) and was forced with ERA5 at the boundaries. The outer nest was anchored to ERA5 data using spectral nudging above the boundary layer [68]. Simulations were made for 10 different regions, welded together along their borders to cover the entire European Union plus Turkey.
The aerodynamic roughness length used in NEWA was based on fixed values for different types of land use; see [69] and [68] for details. For all water bodies, the baseline roughness length was set to 0.0001 m, and the alteration by wind-driven waves followed the modified bulk layer COARE algorithm by [62]. In the surface layer, Monin-Obukhov similarity theory was used to parameterize turbulent diffusion, and vertical damping was added. A modified MYNN-scheme [71] was used for the turbulence in the boundary layer [68]. Sea surface temperature was from OSTIA.
Data were available in 30 min time steps for the 30-year period 1989-2018 at eight height levels, ranging from 10 to 500 m.

Quality Control of the LiDAR Measurements
All observational data were put through a quality control. If the wind speed at the measurement height closest to 100 m was missing, the wind profile for that time step was removed. Furthermore, if 70% or more of the data in a profile were missing, the profile was removed. For Anholt, data at a 315 m height from 23 November 2014 to 31 December 2014 were removed upon inspection. Furthermore, for Anholt, all profiles when the wind direction was from outside the sector 189-333 • were removed due to possible wake effects from the Anholt wind farm, following the method of [32]. The wind speed at 92 m from the FINO2 LiDAR was compared to the wind speed measured by the FINO2 tower anemometer at a 92 m height, and profiles were removed when the ratio was more than 20% off. However, if temperatures were below 0 • C, profiles were kept due to possible ice-related malfunction of the anemometer. Spikes were removed in all datasets, even though this was mainly an issue for Utö. In total, this led to a data removal of approximately 51% for Anholt, 8% for FINO2, 1% for Östergarnsholm, and 5% for Utö.

Comparison with Reanalyses
For the comparison between the LiDAR measurements and the reanalyses, the grid point closest to the location of each observation station was selected; see Table 2. This implies that for observations close to land and for reanalyses with low resolution, there was a risk that the closest grid point was located in a grid box that did not perfectly resemble the conditions at the observation site. Figure 2 shows the example of Östergarnsholm, where the land/sea-mask from the reanalyses is plotted together with the grid. As can also be seen in Figure 2, the horizontal resolution is crucial to resolving the coastal zone and geographical features properly.
All data, both from observations and from reanalyses (when necessary), were time-averaged to hourly values with the time stamp at the beginning of each hour. For MERRA2, the wind speed calculated from the u and v components was linearly interpolated to hourly data. Only time steps with concurrent data from observations were used.
Since the observations only covered a few years and with more data from some seasons (e.g., Östergarnsholm data were temporally biased to the winter months, with measurements starting in December 2016 and running up to, and including, December 2018), the results could not be considered as representative for the climate at the site. However, the observations were compared with reanalysis data for the exact same time steps, and it was as such that the results should be interpreted.  In order to compare the wind speed measured by the LiDARs with the reanalyses, the wind speed at 150 and 300 m was extracted. In some cases, the winds at these heights were explicitly given (such as for Östergarnsholm and UERRA); otherwise, the wind was calculated by fitting a piecewise cubic Hermite interpolating polynomial (PCHIP) [72,73] on a logarithmic height scale to the two levels closest to this height (observations) or to the full profile (reanalyses). Compared to a spline, the PCHIP concentrates the curvature closer to the interpolation points, thereby avoiding the typical swings that can occur with a spline while still enabling a continuous description of the profile. For the FINO2 observations, the wind speed at 300 m was extrapolated from the wind speed measurements at 240 and 280 m using the PCHIP.
To study the wind profile, the wind speed at every 10 th meter from the lowest observational height up to 300 m was calculated in a similar way, fitting a PCHIP to the full profile in logarithmic height coordinates and only using observations with complete profiles. The average wind profile was then calculated based on the hourly profiles. Similarly, the wind shear over the rotor was analyzed using profiles with interpolated values at every 10 th meter, from a height of 50 to 250 m.
Note that due to low vertical resolution in some reanalyses, interpolations can sometimes be quite long, which also implies a greater uncertainty. While the interpolations for MERRA2 and ERA5 have to be performed on pressure-based model levels that vary in height from one time step to the next, the direct output on height levels in UERRA and NEWA simplifies the calculations. However, pressure-based model levels are also used in UERRA and NEWA when the simulations are run, and the fixed height levels are derived from these.

Finding the Low-Level Jet
A low-level jet (LLJ) is a local wind maximum in the lowest part of the atmosphere, typically with the core located at a 50-300 m height. While there is no strict definition of an LLJ, it is often considered that the fall-off from the core has to be at least 1 or 2 ms −1 when comparing with the minimum value in the wind profile a few hundred meters above the core; see Figure 3 for an illustration.  Since a clear definition of the LLJ does not exist, both a weaker (≥ 1m s −1 fall-off) and a stronger (≥ 2 m s −1 fall-off) LLJ criterion were evaluated. For a fair comparison, when scanning the wind profiles for LLJs, the reanalyses were limited to start at the same height as the measurements, and all wind profiles were cut at a 300 m height. If more than one LLJ was identified in the profile, the height and speed of the core with the maximum fall-off above are presented.
Previous studies (e.g., [18,74]) stated that reanalyses typically overestimate the height of the LLJ core and that the errors for the relative frequency of occurrence are reduced if reanalyses are allowed to stretch up to 500 m. However, this prevents a fair comparison of the jet's core height and speed.

Wind Profile
The wind profiles were analyzed from the lowest measurement level up to 300 m, and are shown in Figure 4 with the heights where data were provided marked.
Throughout, for all sites, it was clear that MERRA2 suffered from a large negative bias, underestimating the wind speed by typically 1 or 2 ms −1 . The other three reanalyses were fairly similar, but still display some important differences. A previous study came to the conclusion that MERRA2 has poor performance compared to ERA5 for applications in the wind industry [75]. ERA5 and UERRA have very similar average wind profiles, with ERA5 showing slightly less bias for both Anholt and FINO2. For all sites, NEWA has less wind shear on average than ERA5 and UERRA, either with a higher wind speed than the others at lower levels, a lower wind speed higher up, or both. The differences between the wind profiles from the reanalyses and the observations were statistically significant (t-test, 95% significance level) for FINO2 and Östergarnsholm, and for the higher levels in the Anholt profile. For Utö, UERRA and NEWA had a consistent negative bias, but the levels were close to the 95% confidence bound. ERA5 had a negative bias in most of the lower part of the Utö profile, but agreed well with the observations higher up. The differences between the reanalyses can mainly be explained by the differences in handling the surface roughness. While MERRA2 uses a combination of [46] and [47], the more modern parameterization by [62] is implemented in UERRA and NEWA, which also rest on a larger set of observations. Furthermore, the various turbulence parameterizations used and differences in vertical resolution in the lower part of the boundary layer affected the results. For a further discussion on the general negative bias in wind speed in marine conditions by reanalyses, see [76]. For a comparison of different parameterization schemes and their effect on the offshore wind profile over the Baltic Sea, we refer the reader to [77].
Comparing different sites, it is also clear that the wind profile is much better captured by the reanalyses at some of the sites (Anholt and Utö) than others (Östergarnsholm and especially FINO2). For the case of Östergarnsholm, this could partly be explained by the influence of the larger island Gotland, less than 5 km west of Östergarnsholm. During southwesterly winds, an internal boundary layer from Gotland could stretch all the way out to Östergarnsholm [36], giving a more land-like wind profile. ERA5 and UERRA perform better under these circumstances than when the wind is directed from the open sea (results not shown). For MERRA2 and NEWA, there is less difference in the performance for the different wind directions. It is also a well-known problem in all reanalyses and weather models that accurate modeling of the stable boundary layer is challenging, despite recent improvements in both data assimilation and schemes for turbulent mixing during these conditions (see, for example, [78][79][80] for further discussion).
Regarding the large biases for FINO2, no plausible explanation was found, as this is a pure offshore site with at least 35 km to the closest shoreline and the closest grid point in the reanalyses also in a close range. The small amount of data that went into the profile for FINO2 could be a possible explanation for the large spread. A much better agreement was achieved comparing the reanalyses with 10 years of data from the FINO2 tower up to 100 m (not shown). The FINO2 LiDAR was quality controlled against the FINO2 tower (see Section 2.3), taking the flow distortion from the tower into account [81], but even after this quality control, it is evident from Figure 4 that the FINO2 profile is significantly different from the others. Thus, the FINO2 LiDAR data could be considered to have lower data quality than the other sites, and therefore the results including measurements from FINO2 should be interpreted with some caution.

Wind Speed at Hub Height
While the average wind profile is a climatological measure, Taylor diagrams also take the correspondence in time into account, showing both the correlation coefficient and the centered root-mean-squared deviation (CRMSD), as well as the standard deviation. An exact match of the LiDAR data would result in a correlation coefficient of one, a CRMSD of zero, and the same standard deviation as the LiDAR observations. Figure 5 shows the Taylor diagrams at a 150 m height (a typical hub height for future offshore wind turbines) from the four sites. Analyzing the diagrams, it is clear that MERRA2, ERA5, and UERRA were more tightly coupled to the observations in time than NEWA, as they had higher correlation coefficients (ERA5 had the best performance). This was not surprising considering the frequent updates of the analysis fields in those products compared to the long forecasts used in NEWA; see Table 1. This result is also in line with the NEWA benchmark [82], where a LiDAR device on a ship was measuring along a track over the southern Baltic Sea, concluding that for wind speeds at a 100 m height, ERA5 had a higher correlation (r = 0.94) than NEWA (r = 0.88).
Similar reasoning as for the correlation also applied for the CRMSD. The CRMSD is both dependent on the model resolution and the correlation. Although not intuitive, it is often the case that high-resolution models have poor correlation [83]. This is reflected in the results, with NEWA having the highest resolution and the longest forecasts, consistently showing a larger CRMSD than the other reanalyses tested.
Typically, the standard deviation in MERRA2 was too low; in other words, MERRA2 underestimated the variation in the data. This was generally also the case for the other reanalyses, although not to the same extent. UERRA had a variance that was closer to the observed than the others.
As a complement to the Taylor diagrams, Table 3 summarizes the average bias at 150 and 300 m. Compared to Figure 4, where the average bias can also be seen in the wind profiles, this table includes all hours with data at 150 or 300 m, not only the times when the observed profile was complete (no missing data) up to 300 m. Since processes close to the surface are complicated to resolve in any meteorological model, it could be expected that the agreement between the reanalyses and observations would improve with increased height. Therefore, it was interesting that for Anholt, FINO2, and Östergarnsholm, the reanalyses generally showed a better performance at 150 m than at 300 m, and only for Utö, the opposite was true. It was also noticeable that while ERA5 has coarse horizontal resolution compared to UERRA and NEWA, the high vertical resolution in ERA5 was sufficient to capture the average conditions as well as (or even better than) the high resolution reanalyses. Regarding the frequency distributions of wind speed (Weibull distributions) (see Figure 6 for distributions at a 150 m height), it is clear that MERRA2 suffered from its poor resolution and systematically underestimated the wind speed (compare with Figure 4). ERA5, UERRA, and NEWA all captured the distribution satisfactorily, but for Anholt, both ERA5 and UERRA overestimated the wind speed. For Östergarnsholm, NEWA underestimated the frequency of the most common wind speeds. Comparing the distributions at 100 m (not shown), NEWA performed better than the others for Östergarnsholm, but otherwise, the results were similar.  Looking at the residuals in Figure 6, it is evident that MERRA2 overestimated the frequency of low wind speeds and underestimated stronger winds for all sites. For FINO2, this was the case for all reanalyses. However, the opposite was true for Anholt and Östergarnsholm. Here, ERA5, UERRA, and NEWA slightly underestimated the 5-10 ms −1 winds, and overestimated the winds stronger than 10 ms −1 . For Utö, there was a tendency that ERA5, UERRA, and NEWA overestimated the low wind speeds and underestimated the stronger winds just like MERRA2, even though the residuals were very small.

Average Wind Shear
The average wind shear over the rotor is an important measure, both for loads on the wind turbine and wind resource estimation. An evaluation of the average shear, based on a reference 15 MW offshore reference wind turbine [5] with a hub height of 150 m and a diameter of 200 m, is presented in Figure 7. Since only measurements from Östergarnsholm and Utö include data at a 50 m height, only results from these two stations are shown. Due to the large errors in the wind profile, MERRA2 was omitted in this and all further analysis.
Wind shear occurs when either the wind speed or wind direction changes with height. In this paper, we only considered the change of wind speed. The wind shear was calculated on the hourly profiles as ∆u/∆z in steps of 10 m from 50 to 250 m, and then the average shear was computed. A standard (power-law) wind profile typically has a slight positive average shear or a value very close to zero since most of the shear over open water takes place in the lowest 50 m of the profile. Negative shear occurs when the wind profile is tilting "backwards", i.e., the wind speed is decreasing with height. Large positive values of shear, on the other hand, occur when the wind speed is rapidly increasing in the 50-250 m layer. This is typical for high geostrophic wind speed, strong stable conditions (capping inversions), and/or cases with a strong low-level jet. The peak in the observations at very small positive values (approximately 0.001-0.002 s −1 ) was well captured by NEWA, but the frequency was exaggerated. This manifested itself in that NEWA underestimated both the frequency of negative shear and also some of the positive shear (at 0.003-0.013 s −1 for Utö, but at more extreme shear, 0.015-0.025 s −1 , for Östergarnsholm). ERA5 and UERRA both had a tendency to overestimate the positive shear for Östergarnsholm considerably, while all reanalyses underestimated the negative shear to approximately the same degree. For UERRA, this can partly be explained by the fact that the grid point closest to the Östergarnsholm observation station is located on the eastern tip of Gotland (see Figure 2) with a land/sea-mask value of 0.31 (Table 2). A higher surface roughness also implies a higher shear in the 50-250 m layer. As described in Section 2.2, the reanalyses also handle the aerodynamic roughness quite differently.
For Utö, UERRA still overestimated the shear (despite a land/sea-mask value of zero), but ERA5 was almost perfect in the positioning of the peak. However, while UERRA captured the negative shear satisfactorily, the occurrence of these conditions was underestimated by ERA5.
The average wind shear distribution for Östergarnsholm was significantly different when the wind was directed from land compared to open water. While the distribution for the open sea sector was similar to the one shown in Figure 7, the distribution when the wind direction was from land ( Figure 8) was much flatter and, as expected, indicated more cases with higher shear. However, even for the land sector, ERA5 and UERRA still overestimated the positive shear, except for values above 0.025 s −1 . NEWA overestimated the positive shear up to 0.015 s −1 , but then underestimated the occurrence of stronger shear. Another possible explanation for the general overestimation of the shear by ERA5 and UERRA could be that the diffusivity in the turbulence schemes used in these reanalyses (see Section 2.2) is too small. The division of the wind direction into different sectors for Östergarnsholm followed [36]. To conclude which reanalysis most accurately represented the observed distributions regarding the average wind shear over the rotor, the Earth mover's distance (EMD) was used as an objective metric [84]. The EMD is equal to the area between the cumulative distribution functions and can be described as a measure of how much work is needed to transform one distribution into another (i.e., how similar the distributions from the reanalyses are to the observed distribution at each site).
Interestingly, the EMD-values in Table 4 show that for Östergarnsholm, NEWA was better than ERA5 and UERRA when the wind was from the sea sector (fetch over open water of at least 140 km), but performed worse when the wind was directed from Gotland. In total, UERRA had the distribution that was closest to the average wind shear distribution measured by the Östergarnsholm LiDAR.
For Utö, where the wind from all wind directions is practically unaffected by land for at least 40 km, it could be expected that the EMD for NEWA would be smaller than the values for ERA5 and UERRA, just as for the sea sector for Östergarnsholm. However, the opposite was true, with UERRA being the reanalysis with the distribution most similar to the observations, followed by ERA5.

Low-Level Jets: Fall-Off and Frequency Bias
In Figure 9, the maximum fall-off above a local maximum is presented for all wind profiles in the observations and the corresponding time steps in the reanalyses. As stated previously, data from MERRA2 were excluded from the LLJ analyses due to poor vertical resolution. Note that for a standard wind profile, the fall-off was zero (since there was no local maximum in the profile), and thus the vast majority of the data points were located at the origin. Furthermore, as was the case for rare events, the number of correct rejections outnumbered the number of hits, false alarms, and misses. The contingency data presented in the figure corresponds to the 1 ms −1 criterion, also indicated by the gray lines showing the limit. By shifting the criterion to 2 ms −1 , it is obvious from the figure that the number of LLJ cases decreased drastically, both for the observations and the reanalyses.
The number of false alarms (when there was an LLJ in the reanalysis, but not in the observations) was typically much less than the number of misses (when there was an LLJ in the observations, but not in the reanalysis) for ERA5 and NEWA, meaning that the reanalyses had a difficult time resolving the LLJ events often enough. UERRA gave a much better result and consistently also showed a higher number of hits (perfect timing of the LLJs). Even though the number of false alarms was less than the number of misses, it is interesting that the density of the data for all sites except Anholt indicate that the reanalyses had a (weak) local maxima in the wind profile more often than the observations. The underestimation of the number of LLJs can also be seen in Figure 10, where the frequency bias (FBIAS) is presented. The frequency bias is the ratio of the total number of predicted LLJs for a site to the total number of observed LLJs, and should equal one for a perfect score. A value of the frequency bias less than one implies an underestimation of the frequency of LLJs, and vice versa.
A possible explanation for the general underestimation of the frequency of LLJs is that the reanalyses have too-high turbulent mixing during stable conditions, flattening the wind profile and "smearing out" anomalies such as LLJs [18,79].
While the contingency data presented in Figure 9 use a strict 1:1 correspondence in time to classify an LLJ event as a hit, FBIAS is a more climatological measure, considering only the total number of LLJ cases in both observations and reanalyses. In Figure 10, results are shown for both the 1 ms −1 and 2 ms −1 criteria, and it is clear that the more tolerant 1 ms −1 criterion consistently gave values closer to one (except for UERRA for Anholt), with a major improvement for ERA5. Based on this result and since more data gave more reliable statistics, the following discussion will only consider the 1 ms −1 criterion.
Coming back to Figure 9, it is interesting to note that even for well-pronounced LLJs, when the observed fall-off was 5 ms −1 or more, the reanalyses struggled to capture the event, revealing a difficulty in accurately resolving phenomena which are both temporally and spatially local. Furthermore, the opposite was true: the reanalyses sometimes falsely created a rather strong LLJ when no LLJ was present according to the observations. The figure corroborates the findings in [85] for the North Sea, where it was also clear that the number of misses was much higher than the number of false alarms and that NEWA performed better than ERA5. UERRA was not analyzed in [85]. . Scatter plots of the maximum fall-off in the reanalyses versus the maximum fall-off as measured by the LiDARs. The gradient of the color indicates the density of the data (brighter color: higher density). The gray lines mark the 1 ms −1 limit used to classify a low-level jet, and the 1:1 ratio is also marked. Contingency data with the number of hits, false alarms (FAs), correct rejections (CRs), and misses are also given, with the number for correct rejections referring to the amount of data points in the lower-left corner in each panel.

Annual and Diurnal Cycle of Low-Level Jets
Regarding the annual cycle (Figure 11a), all observation sites showed that the LLJs were most common in late spring, with a maximum in May. This was a well-known feature of LLJs from earlier studies [14,18], and is connected to the stable stratification that is typical over the Baltic Sea in this season, with warm air advected from land over water that is still cold after the winter. The stable stratification leads to frictional decoupling and a subsequent acceleration. In addition to this, the sea breeze circulation becoming increasingly common during late spring can also induce an LLJ [26].  At Utö, LLJs were present almost 60% of the time in May, and thus it is clear that the wind conditions at the site could not be adequately described with a standard wind profile assumption. The annual cycle was most pronounced for FINO2, Östergarnsholm, and Utö, and for all these sites, UERRA had a relative occurrence closer to that of the observations than did ERA5 and NEWA. Figure 11b presents the diurnal cycle for the season April to July (when LLJs are most common). Compared to land, where LLJs are most frequent during night, several different processes can form offshore LLJs, resulting in a more constant probability for an LLJ to occur throughout the day. No clear diurnal pattern was visible in the observations, and even though the spread in the data was large, it is still interesting to note that the reanalyses showed somewhat different trends. For example, for Anholt and Östergarnsholm, NEWA displays a sequence with more LLJs in the afternoon and evening, and fewer LLJs during night and morning. For Östergarnsholm, the same pattern is also seen in ERA5 and is even more pronounced. UERRA, on the other hand, has a somewhat opposite cycle with more LLJs during night and morning than in the afternoon and evening for all sites. While no type of erroneous diurnal pattern was preferred over another, it is important to keep in mind that the diurnal LLJ cycles from the reanalyses cannot be trusted blindly. It is also important to note that the annual and diurnal cycles cannot be seen as representative for the climate mean at the sites, as the averages only represent the time period for which observational data were available.

Core Height and Core Speed of Low-Level Jets
To characterize how well the reanalyses capture the height and speed of the LLJ core, boxplots are presented in Figure 12. Since the height levels were discrete for both the observations and for the reanalyses, the details in Figure 12-left should be interpreted with some caution. Nonetheless, it is clear that both ERA5 and NEWA typically underestimated the height. UERRA on the other hand underestimated the height only for Utö, but overestimated the height for Anholt, FINO2, and Östergarnsholm. Regarding the distributions, the reanalyses typically could not capture the highest LLJs well, but UERRA was better than the others in this sense. As noted earlier, previous studies [18,74] concluded that models typically place the LLJs too high up and that the fall-off above the core is too weak, leading to an underestimation of both the number of LLJs and the core height. However, it is also possible that there were a number of "false" LLJs in the LiDAR data due to uncertainty in the measurements. Regarding the core speed, it is striking that LLJs could occur over a wide range of wind speeds, from very calm to stormy conditions. Furthermore, the median speed of the jet core was comparable to the average wind speed at the sites, referring back to Figure 4. It is important to keep in mind that the vertical resolutions in the datasets were quite different. In profiles with high vertical resolution, the likelihood that the core height and speed of an LLJ would be accurately described was higher. There was a systematic underestimation of the core speed, possibly connected to the underestimation of the core height of the LLJs by the reanalyses. ERA5 performed worst regarding core speed. UERRA and NEWA were more similar, but UERRA was better for the extreme cases.
Selecting only the cases when the observations and the reanalyses had an LLJ at the same time (i.e., the hits), the ratio of the core height and core speed could be compared directly between the reanalyses and the observations; see Figure 13. Since the number of hits was much higher for UERRA than for ERA5 and NEWA, much more data went into this plot for UERRA than for the other reanalyses. Interestingly, while ERA5 underestimated the core height overall (Figure 12), it slightly overestimated the height for the hits at Anholt, FINO2, and Utö. The spread around the center line was quite large for all sites for both core height and core speed, sometimes over-or under-estimating the core values by as much as a factor of four, revealing the difficulty of correctly describing the LLJ even when the timing is perfect. Figure 13. Boxplot of the ratio of the core height (speed) in the reanalysis to the measured core height (speed) for the time steps when there is a simultaneous LLJ in the observations and in the reanalysis (i.e., the hits). The number of hits for each site and each reanalysis are given in Figure 9. Values closer to one indicate a better agreement between the reanalysis and the observations. Overestimation of the core height or speed gives a ratio larger than one; underestimation gives a ratio smaller than one.

Discussion
In the quest to find the reanalysis most suitable for offshore wind applications among recent freely and publicly available reanalyses, it is clear that all reanalyses have their pros and cons.
First of all, our analysis does not support the use of MERRA2 for offshore wind energy purposes, as both the horizontal and vertical resolutions are insufficient to adequately describe the wind conditions at a site. Between ERA5, UERRA, and NEWA, it was, however, a tight race.
The linear interpolation that was applied to the MERRA2 data in order to get hourly time steps added uncertainty to the results. As wind is not linear in its nature, the linear interpolation is a simplification that can sometimes cause large errors. To improve the method, temporal changes in the observed wind speed could be used as a base for the interpolation. However, with a sufficiently long time series, errors from the linear interpolation should even out, as the wind speed would be overestimated as often as underestimated. In an attempt to quantify the systematic error from the linear interpolation from three-hour time steps to hourly data, the method was applied to seven years of UERRA data at Utö. At a 100 m height, the interpolation resulted in an underestimation of the wind speed of 0.09 ms −1 . Furthermore, the wind speed distribution was narrower as the variation in the data decreased. Interpolating the magnitude of the wind speed always resulted in a higher (or equal) wind speed than interpolating the u and v components separately and then calculating the wind speed.
ERA5 and UERRA better matched observed wind profiles and had a higher temporal correlation with measurements than NEWA. Wind atlases are not created as tools for forecasting, but rather to describe the meteorological conditions at a site in the best way possible. However, wind atlases are still sometimes used for case studies or to initialize models, and thus the lower correlation is important to keep in mind. As noted in Section 3.2, it is probable that much of the lower correlation in NEWA is a consequence of high horizontal resolution and long forecasts. It is well known [83,86] that models with coarser resolution and smoother solutions can get better scores compared to higher-resolution models when using standard verification metrics. However, upon subjective inspection, the higher-resolution datasets could have a more realistic representation of local features, even though the exact timing and placing might be slightly off. Coarsening the NEWA data spatially to the same horizontal resolution as ERA5 or UERRA could possibly improve the skill scores. However, there may be physical connections between variables that would be lost in the averaging process, and NEWA would still suffer from the long forecasts used compared to ERA5 and UERRA. The longer time series in ERA5 and UERRA (currently around 40 and 60 years, respectively) provided an advantage over NEWA (30 years), but if high horizontal resolution is crucial for the application in mind, then NEWA may still be the best option. ERA5, UERRA, and NEWA all have similar wind speed distributions at hub height, but looking at the average wind shear over the rotor, there is a clear difference between the reanalyses. UERRA has a shear distribution that was more similar to the observed distribution than that of ERA5. For Östergarnsholm, NEWA was the best choice to describe the shear when the wind was from the sea sector, but UERRA was better when the wind was directed from land.
Since mesoscale phenomena such as LLJs are very common in the coastal zone, with LLJs occurring as often as 60% of the time in May at Utö, it is of utmost importance that a reanalysis can capture these types of events to give a trustworthy wind climatology over the Baltic Sea. The fact that LLJs are local in space and time makes it hard for weather models to resolve them properly, but succeeding in this would improve, for example, energy production forecasts.
The frequency of LLJs had a better agreement with observations in UERRA than in ERA5 or NEWA, which had an underestimation of more than 50%. Regarding the core height and core speed, both UERRA and NEWA described the distribution better than ERA5, but UERRA captured the extremes in more accurately than NEWA. Despite this, UERRA is not perfect, and it can be seen in Figure 11b that the diurnal LLJ cycle in the reanalyses cannot be trusted directly as they were.
While this study is by no means a complete overview of all the different factors that influence wind conditions in the coastal zone, we still suggest that UERRA should be the first-hand choice when analyzing LLJs, either used as-is or as input data for downscaling. However, with the new version of the reanalysis from ECMWF, ERA6, with planned production starting in 2023 and new work packages to improve NEWA likely underway, it is possible that this conclusion might need to be revised in a not too distant future. However, UERRA will be updated to CERRA (Copernicus European Regional ReAnalysis) in 2021 with 5.5 km resolution [87], and it will be interesting to see if it can keep its position as the best choice. With that said, it should also be mentioned that there are several other available reanalyses which are not discussed in this paper, such as the European COSMO-REA6 [88] and the global JRA-55 [89], which also are possible candidates that should be investigated further. Due to insufficient overlap with the observations from Östergarnsholm and Utö at the time of the analysis, COSMO-REA6 was excluded from evaluation in this paper. Carvalho [76] compared surface winds in JRA-55 and MERRA2, and concluded that the reanalyses had similar error metrics.
It is important to note that despite the quality control, there may still be systematic measurement errors and random errors in the comparison. In Section 2.1, it is mentioned that the pulsed and continuous-wave LiDARs have uncertainties in measuring wind speed at an exact height. Furthermore, it is worth noting that under some weather conditions, such as very dry air with few aerosols (typical for northerly winds over the Baltic Sea) or when low clouds are blocking the path of the laser beam, the backscattered signal can become too weak, leading to discarded measurements and, in the long term, to a bias in the data since these types of weather conditions are excluded from the analysis. Furthermore, in order to create a more rigid evaluation of the reanalyses, measurements from more sites and covering more years are necessary, as the spatial and inter-annual variability, depending on the dominating synoptic weather conditions, is large.
A deeper understanding of how to improve the modeling of mesoscale processes in the coastal zone, especially during stable conditions, would require sensitivity studies to determine which processes are crucial and identify the physical and turbulent schemes with the best performance. Specific research points resulting from this study indicate the particular importance of the diffusivity of the turbulence schemes and lower boundary conditions, including roughness and the air-sea exchange of heat. For offshore conditions, it is crucial that the turbulence schemes can handle sharp gradients close to the surface. Furthermore, fundamental and applied research combined with longer time series of data from offshore wind measurements (both high meteorological masts and LiDAR measurements) and more observation sites is important to allow for a better understanding of the winds in the marine atmospheric boundary layer.

Conclusions
In this paper, LiDAR measurements up to a 300 m height from four sites in the Baltic Sea were compared with four state-of-the-art reanalyses (MERRA2, ERA5, UERRA, NEWA) focusing on the meteorological preconditions for offshore wind power in the basin. The reanalyses were evaluated in terms of general wind conditions, with a specific focus on how well they describe wind maxima at low levels-so-called low-level jets. The results provide insight into the accuracy of reanalyses for wind resource assessment.
We conclude that there was a general underestimation of the average wind speed by all reanalyses tested. Further, the vertical resolution in MERRA2 was insufficient to properly resolve the wind profile. The average wind shear over the rotor of a reference 15 MW offshore wind turbine was too low in NEWA, but was overestimated by ERA5 and UERRA. Regarding general wind characteristics (e.g., wind speed distributions), the performances of ERA5, UERRA, and NEWA were more similar. The best choice of which reanalysis to use depends on the application.
It was shown that low-level jets are common over the Baltic Sea: at Utö, LLJs were present during almost 60% of the time in May. Thus, LLJs are important mesoscale phenomena to consider when estimating the offshore wind power potential in the Baltic Sea. LLJs can appear within a wide range of wind speeds, and the median speed of the jet core was comparable to the average wind speed at a site. The core speed was systematically underestimated by the reanalyses. UERRA was the reanalysis that best captured the frequency of LLJs throughout the year when compared to observations, and also captured the extreme cases with high wind speeds better than the others. Data Availability: Data used for creating the results in this paper were collected as follows: LiDAR data for Anholt and FINO2 were provided by [90]. MERRA2 data for wind components, temperature, and specific humidity on model levels were collected from the Global Modeling and Assimilation Office [91], and the wind speed at 10 m and surface pressure from [92]. Data for the MERRA2 land/sea-mask were collected from [93]. All data from ERA5 (hourly values on model levels for wind components, temperature, and specific humidity, hourly data on a single level for surface pressure, and land/sea-mask) were generated using the Copernicus Climate Change Service [94]. UERRA data for wind speed (analyses and forecasts) at height levels were downloaded via the ECMWF MARS data service (2020-02-03). The land/sea-mask was generated using the Copernicus Climate Change Service [95]. Neither the European Commission nor ECMWF are responsible for any results in this paper. NEWA data for wind speed and land/sea-mask were obtained from the New European Wind Atlas [96], a free, web-based application developed, owned, and operated by the NEWA Consortium. For additional information, see www.neweuropeanwindatlas.eu.