Added value of regional reanalyses for climatological applications

Regional reanalyses constitute valuable new data sources for climatological applications by providing consistent meteorological parameter fields commonly requested, e.g., wind speed, solar radiation, temperature and precipitation. Within the European project Uncertainties in Ensembles of Regional ReAnalyses (UERRA) three different numerical weather prediction (NWP) models have been employed to generate different European regional reanalyses and subsequent surface reanalysis products. The uncertainties of the individual reanalysis products and of the combined UERRA multi-model ensemble are investigated by comparing against observations. Here, we provide guidance on the meteorological parameters and spatial-temporal scales where regional reanalyses add value to global reanalyses. The reanalysis fields are compared to station measurements and derived gridded fields, as well as satellite data. In general, reanalyses are especially valuable in data sparse areas, where the NWP models are superior in transporting information compared to the traditional gridding procedures based on station observations. For wind speed at heights relevant for wind energy, where little conventional observations exist, regional reanalyses can provide higher resolution horizontally, vertically, and in time, adding value to global reanalyses. Solar radiation fields capture the variability in general, however, they are prone to model-dependent biases. Temperature fields were generally found to be in good agreement with station observations, with biases for the (moderately) extreme values causing potential pitfalls for threshold applications such as climate indices. Comparisons of the precipitation fields in different areas of Europe demonstrate that various reanalyses excel in different regions. The multi-model ensemble of regional reanalyses was found to provide better uncertainty estimates than an ensemble realisation from one reanalysis system alone. The freely available regional reanalyses provide a new, high resolution data source, which might be attractive for many applications, especially when conventional data are sparse or restricted by data policies.


Introduction
Applications in need of meteorological or climatological data can draw from a growing abundance of new data sources: regional reanalyses (e.g., Staffell and Pfenninger 2016). Reanalyses use a fixed, modern numerical weather prediction (NWP) model version to reconstruct the meteorological conditions of the past, considering the available historical observations. This has been done with remarkable success for the widely used global reanalyses (e.g., Kalnay et al 1996, Dee et al 2011, Ebita et al 2011, Rienecker et al 2011, serving tens of thousands of users in need of climatological information (Buizza et al 2018). User requests for higher spatial and temporal resolution have been addressed by various downscaling methods, with varying success depending on Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence.
Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. meteorological parameter, applied method (e.g. Sheffield et al 2006, and references therein) and scale (Schaaf et al 2017).
Advancement of computing power has allowed for regional reanalyses, which can ingest additional observations and achieve higher resolution by means of regional NWP models, e.g., the North American Regional Reanalysis (Mesinger et al 2006) and Arctic System Reanalysis (Bromwich et al 2012) followed by recent efforts in Europe (Bollmeyer et al 2015, Dahlgren et al 2016, Jermey and Renshaw 2016, Gleeson et al 2017 as well as in other parts of the world (Buizza et al 2018).
Recently, within the European Seventh Framework Programme (EU FP7) project Uncertainties in Ensembles of Regional ReAnalyses (UERRA) comparisons and uncertainty estimation of several regional reanalyses and surface reanalysis products became possible.
Here, we present the new data sets and their uncertainties as well as their combination (multi-model ensemble), considering the comparison methods discussed in Borsche et al 2015 and extending the work of Niermann et al 2019. The intention is to inform possible renewable energy, agricultural or hydrological applications how to benefit from the climatological information present in the regional reanalyses at shorter scales (hourly to multi-annual). Long-term trends remain a challenge for reanalyses (Simmons et al 2017, Torralba et al 2017 and are not scope of this paper. Here, we explore the question 'which parameters and scales can be expected to constitute a valuable data source with added value over conventional data and global reanalysis?'. The individual newly available regional reanalyses covering Europe are summarized in section 2. The added value and limitations of regional reanalyses are discussed in section 3, covering wind speed, solar radiation, precipitation and temperature fields at the sub-daily to multi-annual scale. The most relevant analyses are presented here, based on comparisons against global reanalyses, station observations, gridded station observations, gridded satellite data and with the multi-model ensemble. 2. Data description and methodology 2.1. Regional reanalyses covering Europe The regional reanalyses considered here (see table 1) have in common that they cover Europe (CORDEX-EUR11 region), and are forced at their boundary by ERA-Interim (Dee et al 2011). The grid output resolution varies between 5 km and 12 km and covers the sub-daily timescale. The regional reanalyses differ in the NWP model used and the respective data assimilation method, which results from the respective modelling communities, namely the Consortium for Small-scale Modeling (COSMO) employed in the regional reanalysis used at the Hans-Ertel Centre for Weather Research (HErZ) University of Bonn, Germany and Germany's national meteorological service (Deutscher Wetterdienst, DWD), the Unified Model (UM) used at the Met Office, Exeter, UK, and the Hirlam Aladin Regional Mesoscale model (HARMONIE-ALADIN). The latter is used at the Swedish Meteorological and Hydrological Institute (SMHI), and as a starting point to further increase spatial resolution. For the forcing for the SURFEX surface and soil model, the downscaling was only done through interpolation by Météo-France. These downscaled fields are refined with additional data in a surface analysis with observations that are not used in the 3D reanalysis, resulting in MESCAN-SURFEX. Also using HARMONIE-ALADIN as starting point, higher resolution ALADIN forecasts were run by Météo-France to obtain the high-resolution MESCAN forecast wind fields.
For a large range of meteorological parameters and experiments, the output is freely accessible from the common UERRA archive http://apps.ecmwf.int/datasets/data/uerra and through the Copernicus Climate Data Store (CDS) https://cds.climate.copernicus.eu/. Free access to the COSMO-based reanalysis with 6 km resolution (COSMO-REA6) is provided at ftp://opendata.dwd.de/climate_environment/REA. Coverage starts as early as 1961 in case of the HARMONIE-ALADIN based regional reanalysis Version 1 (HARMONIE v1), 1979 for the UM-based regional reanalysis (UM), and 1995 (COSMO-REA6). All UERRA experiments cover the period 2006-2010 for the purpose of comparability, with several products running to 2017, and extension to near real-time is planned (COSMO-REA6) or already implemented (HARMONIE v1).
Besides the deterministic reanalysis products, several ensembles are analysed: For COSMO, at 12 km resolution (COSMO-REA12 ensemble) and in case of UM with 36 km resolution (UM ensemble) 20 members each were calculated with disturbed observations. The MESCAN-SURFEX ensemble members have been calculated with different model physics.
The deterministic UM reanalysis, the COSMO-REA12 reanalysis run with undisturbed observations nudged, the HARMONIE v1 reanalysis, and the MESCAN forecast product constitute together the UERRA multi-model ensemble. Table 1. Overview of the characteristics of the regional reanalysis products covering Europe. Note the resolution of the grids is in degree, and converted to km for easy interpretation. For interpretation it should be noted that the effective resolution may be considerably coarser than the grid resolution.

Regional reanalysis product
Grid

Reference data and methods for comparison
Comparisons of reanalysis data with station data, gridded station data, satellite data, and other reanalyses are helpful for users considering switching from conventional data sources to reanalysis. Directly comparing point measurements to grid cells implicitly assumes that averaging in time resembles averaging over the grid cell. The main practical restriction is the availability of high quality reference data at the spatial and temporal scales of interest. Thus, the uncertainty analysis is restricted to areas and times where high quality data are available, which is no constraint for the comparison with global reanalyses. Satellite data have coverage and resolution constraints, however, the Surface Solar Radiation Data Set -Heliosat (SARAH-2) from the EUMETSAT CM SAF (Pfeifroth et al 2017) allows the characterization of radiation fields over Europe and were re-gridded to 0.1°×0.1°regular latitude-longitude grid (using conservative interpolation) for comparison to the reanalyses' global solar irradiance fields. Data property and copyright constrain the usability of many wind and tower observations, restricting our wind evaluation to Germany Common uncertainty measures like correlation, root mean square error (RMSE), bias, frequency distributions, and ensemble spread are calculated as appropriate. Generally, all observations come with uncertainties, which will impact the statistical measures mentioned above. By comparing several products to the same observations, differences between the products are revealed.

Wind
Regional reanalyses provide plenty of spatial details in their wind fields, generally agreeing on the main features (see figure 1 for illustration) and differing slightly only in some areas. Statistical parameters like correlation with observations indicate significant differences between the regional reanalyses (see, e.g., figure 2).
To evaluate whether the high resolution detail added by the regional reanalysis represent reality, their correlation with available wind mast data in the Northern Sea and Baltic Sea as well as Cabauw (Netherlands) and Lindenberg (Germany) is compared with the values obtained by ERA-Interim, choosing 6-hourly data to ensure comparability, as discussed in Borsche et al 2016 for the period 2006-2010. Correlation coefficients of the order of 0.9 are found. In some, but not all locations an added value over global reanalysis is statistically significant (see figure 2), confidence intervals are calculated with the CRAN R-project package 'stats' for the 95% level (R Core Team 2013). The advantage of regional reanalysis is not surprising for 10 m winds, as 10 m wind observations are assimilated in the COSMO and UM reanalyses. In two out of the four cases shown in figure 2 this does not lead to improvements in correlations with observations at approximately 100 m height.
The high correlation coefficients indicate the precise capturing of time variability of the wind speed by regional reanalyses for the hourly, daily and monthly scale.
However, the absolute values of the reanalyses wind speeds may be biased against station measurements, depending on local topographic effects and their representation in the reanalyses (Kaiser-Weiss et al 2015). As the frequency distribution of wind speed is non-Gaussian, the bias also depends on wind speed.
The RMSE of an ensemble mean reflects systematic differences and is less influenced by random error. For COSMO-REA12 the ensemble mean RMSE is 1.5 m s −1 for hourly values over Germany and reduces to about 1 m s −1 for larger time scales, while the UM ensemble RMSE is consistently 0.3 m s −1 larger (see figure 3).
As can be expected from smoothing effects, the RMSE decreases with increasing temporal smoothing ( figure 3). However, the spread of the regional reanalyses is smaller than the RMSE of the ensemble mean ( figure 3). The spread is a measure of uncertainty in the model, whereas the RMSE measures the difference between the ensemble mean and the observations. The RMSE includes observation error and representativity error, which are not represented in the models. Spread and RMSE are therefore only directly comparable if estimates of these errors are added to the spread (Saetra et al 2004), which is difficult and not attempted here. With an UERRA-multi-model ensemble based on four members (the deterministic runs of UM, COSMO-REA12, HARMONIE v1 and MESCAN forecast product) some uncertainty due to representativity is included, because of the variety of resolutions used. The resulting spread is still too low (of the order of factor 2) for the hourly timescale (see figure 3).

Solar radiation
The differences in the solar radiation fields of the various regional reanalyses are more pronounced than in the wind fields, because the models use different parametrizations of optical thickness, i.e., they vary in their cloud, aerosol, and radiation modelling. Frank et al 2018 demonstrated the added value with respect to the global reanalysis, and suggested a post-processing to account for the overestimation of COSMO model radiation in cloudy conditions as well as underestimation of COSMO model radiation in case of clear sky due to the used aerosol climatology. The latter effect is confirmed with our comparison against the satellite products SARAH-2.I In figure 4, note the satellite observed high radiation values are not reproduced in the COSMO-REA6 data. The difference between radiation from satellite and from reanalysis is caused by a combination of several effects  which are dependent on location (see figure 5 and supplementary material S1 and S2 is available online at stacks.iop.org/ERC/1/071004/mmedia), and time, with absolute differences most pronounced in summer but relative bias largest in winter (see supplementary material S1-S5 covering each month of 2005-2010 for COSMO-REA6, UM and HARMONIE v1).
For the mid-latitudes, over European land areas, the biases are found approaching the target accuracy of SARAH-2 which is given on the daily and monthly scale as 15 and 8 W m −2 , respectively (Pfeifroth et al 2016), at least for most winter months for all three reanalyses ( figure 5). This corresponds at 45-55°N to a relative monthly bias in the range between −15% (COSMO-REA6) to 60% (UM), see suppl. material S2. Avoiding latitude dependent effects by confining the analysis to the latitude band 45-55°N, a systematic bias varying with   season can be discerned for each reanalysis (like underestimating the radiation in summer by COSMO-REA6), note also a year-to-year variability (see suppl. material S3-5).
The spatial correlation between the reanalyses radiation and the satellite observations is high (see figure 6) due to the general ability of the reanalyses to capture the synoptic situation. Occasionally, relatively large differences can occur, e.g., for February and March 2006 (see figure 6), hinting that a few years of verification are not sufficient to cover radiative situations. This may be because the latter are either hard to model or hard to observe (e.g., in months with varying snow coverage). The models might have situation dependent biases in their radiative forcing.  3.3. Temperature The regional reanalyses are able to capture the regional temperature distribution in Europe, see the frequency distribution for, e.g., the daily mean 2 m temperature (figure 7). Some model dependent bias remains, e.g., in Scandinavia in winter (figure 7 top), with the bias depending on region and time of year (compare to figure 7 bottom and figure 8). Figure 8 shows maps of the difference in mean summer 2 m temperature between the reanalyses and E-OBS. The regional biases can be large, reaching +2°C in Northern Europe in the Unified Model and −2°C in Mediterranean Europe in HARMONIE. The differences between reanalyses and the observations of E-OBS are often larger where data are sparse in the E-OBS input data set or in areas with complex topography. The latter is particularly visible over the Alpine area or along the coast of Norway. Differences are also seen in the presence of strong gradients, like at Europe's Atlantic coast. Areas with the highest station density in E-OBS are Scandinavia, Germany, Czech Republic, Slovenia and the Netherlands. Areas with poor coverage, for which a mismatch between observations and reanalysis data may be related to a higher uncertainty in the gridded observations, are Italy, Southeast Europe and Eastern Europe. The impact of station density on the uncertainty of gridded observations is particularly clear near the borders of countries having strong differences in the network density used in E-OBS.
Illustrating the problem of temperature biases and their dependency on resolution, the Climate Indices focussing on (moderately) extreme conditions like summer days and frost days, the bias in regional reanalysis versus E-OBS can increase to up to 40 days/year (see supplementary material figure S6).

Precipitation
Precipitation of the regional reanalyses yield spatial structure similar to observational gridded datasets, as illustrated with figure 9 for regions with high resolution data sets covering the greater Alpine region (APGD) (and Fennoscandia (NGCD), see supplementary material, figure S7). Figure 9 shows the mean annual precipitation over the Alpine region. Regional reanalyses capture precipitation patterns and amounts better than both the global reanalysis ERA-Interim and the observational gridded dataset E-OBS in regions of low station density, as the latter both cannot resolve the topographic complexity of the Alps. The UERRA regional reanalyses UM and HARMONIE tend to overestimate precipitation amounts as well as frequency (not shown), especially in complex terrain. Higher resolution regional reanalysis products as MESCAN-SURFEX demonstrate additional value in regions with dense station network, but performance is comparable when the station density is reduced, i.e., station density is the most important factor in determining the quality of the post-processed precipitation fields.
The COSMO reanalyses show the best performance in the Alps and Fennoscandia (see figure S7). The regional reanalyses add value to the global reanalysis ERA-Interim because of higher resolution of topography (and possibly the better representation of precipitation in the non-hydrostatic dynamics of the limited area NWP models).

Conclusions
Regional reanalysis constitute a potentially attractive new data source for many applications. Wind speed, temperature, solar radiation, and precipitation have been compared with reference data and verification scores like bias, correlation and RMSE illustrate the value of regional reanalysis. Compared to the global reanalyses, which provide the boundary conditions, the regional reanalyses can add value (accuracy and reliability) benefitting from their higher spatial and temporal resolution. The differences between the various physics schemes illustrate that 'physics matters', i.e., are another argument for employing a NWP model for increasing resolution.
On the other hand, the new data source comes with uncertainties, especially for threshold based climate indices, as the latter are highly sensitive to bias issues. Evaluation results differ with region, month of year, as well as temporal and spatial scale. For many applications, and especially where topography varies, a local bias remains and either a post-processing should be considered, or a direct allowance for bias in the application. Alternatively, relative instead of absolute measures (e.g., based on percentiles) can remedy the bias issue.
Reanalysis wind fields provide information in heights where little data is available otherwise, and add value to ERA-Interim, especially at the hourly and daily scale. Solar radiation fields are biased, calling for latitude-and time-dependent post-processing. Temperature fields are of comparable quality to E-OBS, and can help to spot issues in the reference data. The spatial distribution of annual accumulated precipitation are well reproduced by Figure 9. Mean annual precipitation in the area of the Alps, in mm per year, 2006-2008, for the reference data sets: APGD and E-OBS, regional reanalyses (HARMONIE v1, UM, COSMO-REA6, COSMO-REA12), the downscaled regional reanalysis product MESCAN-SURFEX, and the global reanalysis ERA-Interim, all rescaled to the original E-OBS grid of 0.25 deg × 0.25 deg. the regional reanalyses, also for areas with complicated topography, capturing more smaller scale features than ERA-Interim. The multi-model UERRA ensemble, based on four members (the deterministic runs of UM, COSMO-REA12, HARMONIE, and MESCAN forecast product), yields more realistic uncertainty estimation than the single-model ensembles.

Outlook
Some of the products discussed here are continuously updated and developed further, e.g., in the Copernicus Climate Change Service (C3S) and at DWD (COSMO-REA6). Work remains to bridge the gap for the reanalyses user community by providing guidance and post-processing and by sharing research results obtained in Europe with similar ventures around the globe. With the global reanalyses moving to higher resolution (i.e., ERA-5 with grid resolution of 30 km), the regional reanalyses will benefit from higher quality boundary conditions, and at the same time the question of added value will arise anew. Future extensions of reanalysis comparisons, including both regional and global reanalyses, is expected to stimulate further development of the reanalyses and to be of benefit for a growing user community.