Long-term patterns of European PV output using 30 years of validated hourly reanalysis and satellite data

Solar PV is rapidly growing globally, creating dif ﬁ cult questions around how to ef ﬁ ciently integrate it into national electricity grids. Its time-varying power output is dif ﬁ cult to model credibly because it depends on complex and variable weather systems, leading to dif ﬁ culty in understanding its potential and limitations. We demonstrate how the MERRA and MERRA-2 global meteorological reanalyses as well as the Meteosat-based CM-SAF SARAH satellite dataset can be used to produce hourly PV simulations across Europe. To validate these simulations, we gather metered time series from more than 1000 PV systems as well as national aggregate output reported by transmission network operators. We ﬁ nd slightly better accuracy from satellite data, but greater stability from reanalysis data. We correct for systematic bias by matching our simulations to the mean bias in modeling individual sites, then examine the long-term patterns, variability and correlation with power demand across Europe, using thirty years of simulated outputs. The results quantify how the increasing deployment of PV substantially changes net power demand and affects system adequacy and ramping requirements, with heterogeneous impacts across different European countries. The simulation code and the hourly simulations for all European countries are available freely via an interactive web platform, www.renewables.ninja. © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
Over the past decade, photovoltaic (PV) power has rapidly become a key renewable energy technology, with global installed capacity rising from less than 1 GW in 2000 to 222 GW in 2015 [24].Fig. 1 shows an overview of PV capacity and power demand in those ten European countries with the most PV capacity as of 2014.The integration of variable PV into existing grids is not easy, necessitating mechanisms for balancing such as flexible demand and load shifting [37], power storage [11] or large-scale grid reinforcement [3].It also makes the operation of electricity markets more difficult technically because more flexible capacity is required for ramping and more reserves must be held to balance out forecast errors; and financially, as zero marginal cost renewables suppress meaningful price signals in the wholesale market, hampering rational decisionmaking for investment [7,16].Additionally, the market value of solar PV tends to decrease as its capacity increases [19], so another challenge is the design and implementation of appropriate market mechanisms for renewables-heavy power systems.
Synthesizing time series of wind and solar power as inputs for energy models to examine these issues is not trivial.Requirements for these data include sufficient spatial and temporal resolution, the preservation of correlations across space and time, and sufficiently accurate representation of real PV plants' behavior.Naively synthesized time series such as typical meteorological years, or average availability factors, will likely lead to significant errors in studies examining high renewable share.There are multiple commercial providers of high-quality time series data, used by project developers conducting due diligence on possible solar sites, including 3TIER 1 and Geomodel Solar 2 However, these are unsuitable for large-scale academic studies due to their high cost, which can reach several thousand USD for a single site's hourly time series.An alternative is to use freely available data from sources such as meteorological reanalyses or direct satellite measurements, and feeding them into PV system simulation tools.However, the significant amount of work required in data processing and configuring simulation parameters represents a major hurdle, and there has been no systematic assessment of their accuracy over wider geographic areas.
Meteorological reanalyses in particular have emerged as an important data source for renewable energy modeling studies over the past few years for several reasons: reanalysis data are usually available globally; they provide several decades of coverage; and they are usually freely available.A major advantage is that reanalyses can provide data for locations or timesteps where no direct observations are available through their integration of measurements and numerical models.However, potential problems like model errors or insufficient spatial resolution make it necessary to validate reanalyses against measured data for applications where high accuracy is necessary.Commonly used global reanalyses of the most recent generation include NASA's MERRA and MERRA-2 [36], the ECMWF's ERA-Interim [10], and the Japan Meteorological Agency's JRA-55 [28].There is a wide range of recent work using reanalysis data for wind power simulation (e.g.Refs.[1,12,40,41]).However, reanalysis data is not widely used to model solar energy, likely for two main reasons.First, PV uptake expanded later than wind, reaching 100 GW globally in 2012 compared with 2008 [23]; and second, satellite imagery provides another freely available data source for solar irradiance with broad geographic coverage (more on this below).In addition, it may also be possible that solar is considered easier to model than wind because of the well-known shape of its seasonal and diurnal variation.
There are however some recent studies using reanalysis data for solar simulations [18].optimize Europe-wide wind and PV capacity mixes to balance demand (without considering transmission).They use a commercial provider to downscale data from the NCEP CFSR (Climate Forecast System Reanalysis) [39] to about 50 km 2 and 1hourly resolution, but do not say whether they validate the resulting wind and PV simulations in any way [17].uses the NCEP/ NCAR 40-year reanalysis [27], to simulate wind, PV and concentrating solar power (CSP) plants on typical days at 6-hourly resolution, in a study examining the storage and transmission capacity requirements for a highly renewable European power system.Again, they do not discuss validating the meteorological input data or power outputs [26].use the MERRA reanalysis to estimate hourly solar PV output over 33 years in the Czech Republic, using two years of measured data to perform bias correction of the MERRA results [20].examine the flexibility requirements for the integration of wind and PV across European power systems, finding that flexibility requirements increase strongly as variable renewables go beyond about 30% penetration.Their time series are generated with MERRA and validated against two years of nationally aggregated PV production in Germany [25], finding a correlation of 0.95 and a root mean square error (RMSE) of about 0.05, but the authors do not attempt to correct for the difference.To summarize, because existing studies perform limited or no validation (in space and time) of their reanalysis-based simulations against historical power output, the suitability of reanalysis data to simulate PV output for Europe-wide studies is not yet proven.
The global coverage of reanalysis data may come at the cost of accuracy [4].examine two reanalyses (ERA-Interim and MERRA) and show that their irradiance values are less accurate than satellite-derived data, frequently predicting clear skies when the sky was cloudy.Satellite images can estimate atmospheric conditions relevant for surface irradiance quite well and are thus an alternative data source.For example, satellite data are used in the PVGIS database, which provides web-accessible annual and monthly solar PV production averages across Europe at 1 km spatial resolution [43].A freely available hourly dataset is the Surface Solar Radiation Data Set (SARAH) [31,32], provided by the CM-SAF consortium based on Meteosat images covering Europe and Africa.Its spatial resolution is higher than MERRA's, and its time range from 1983 to 2014 is similar to that of modern reanalyses.Hourly data has been made available only recently, so it has not seen widespread use yet.
Here, we introduce a database of measured PV panel outputs and use it to validate the power output from PV simulations using reanalysis and satellite datasets.Having accurate irradiance measurements may be insufficient to accurately model the output from real plants, where other effects such as temperature or panel shading can also play an important role, and in particular because we often do not know the exact configuration of the PV sites we wish to simulate.Thus, by comparing simulations against a range of measured power outputs, we can determine how well the performance of real systems can be simulated given our input data, and determine empirical correction factors to account for discrepancies.We then simulate national-level fleets, validate these simulations, and use them to examine the long-term patterns of European PV output and its effect on net electricity demand.The resulting data are made available on a freely accessible web platform, www.renewables.ninja,where users can simulate the hourly power output from PV panels located anywhere in the world.

Solar irradiance data
The reanalysis used here is MERRA [36] and its successor Fig. 1.Electricity demand and installed PV capacity in European countries.The top line compares the minimum demand with the installed PV capacity, resulting in an indicator comparable across countries, but does not consider the temporal correlation of demand and PV production.The x-axis is labeled with ISO 3166 two-letter country codes.MERRA-2 [30].One of the improvements in MERRA-2 is the inclusion of space-based aerosol observations, which suggests that it could have better accuracy for the purpose of modeling solar power.We use MERRA to refer to both the original MERRA and MERRA-2, unless specifically discussing the differences between them.MERRA has several advantages over other reanalyses: it provides observations at 1-hourly intervals, rather than 3 or 6hourly steps, and its spatial resolution is 1/2 latitude and 2/3 longitude3 which translates to roughly 50 Â 50 km across Europe.The CM-SAF SARAH satellite-derived irradiance dataset is used for comparison [31,32].It is available at a considerably higher spatial resolution of 0.05 Â 0.05 , and also at hourly time intervals.With MERRA, the direct irradiance (i.e., the discrete "beam" from the sun) and the diffuse irradiance (scattered in the atmosphere through clouds, aerosols, etc) are estimated using ground-level global irradiance (SWGDN) and top-of-atmosphere irradiance (SWTDN) variables, as described below.In addition, the MERRA T2M variable (temperature at 2 m above the displacement height 4 ) is used as an estimate of ambient temperature.SARAH has some periods of missing data.For the analysis performed here, missing periods of 6 h or shorter are interpolated from neighboring values.Longer periods are filled by taking data for the same dates from the preceding year (or the subsequent year in the first year of data), and adjusting by the between-year difference in mean of the 7 days before and after the missing period.The amount of missing data and the difference between raw and filled SARAH data is shown later in Fig. 11.

PV power output model
The Global Solar Energy Estimator (GSEE) model is used to model PV power output, as outlined in Fig. 2. First, values are linearly interpolated from grid cells to the given coordinates.For MERRA, the diffuse irradiance fraction is estimated with the BRL model [29,35], as it has been shown to perform best amongst a variety of similar models [44].The BRL model requires a clearness index, which is estimated by the fraction of ground irradiance to top of atmosphere irradiance from the MERRA data.SARAH provides both direct and global irradiance, removing the need to estimate diffuse irradiance.Next, irradiance on the plane of the PV panel is computed.In the case of a fixed azimuth angle (the compass direction a panel is facing) and a fixed tilt angle, the plane incidence angle is where h is the sun altitude, a p is the panel azimuth, t is the panel tilt, and a s is the sun azimuth angle.The direct and diffuse plane irradiance (I dir,p and I dif,p ) can then be computed from the global irradiance (I dir and I dif ) by where a is the surface albedo (set to 0.3 here).The model can also simulate tracking systems with a single (adjusted tilt with a horizontal or tilted tracking axis) or two axes (both tilt and azimuth, such that the incidence angle is always zero) using different calculations, which are not reproduced here since all the validation data and simulations presented in this paper represent fixed panels.
Finally, the power output from a given panel is calculated from the in-plane irradiance determined in the previous step.This is done using the relative PV performance model described by Ref. [21]; which gives temperature-dependent panel efficiency curves.Panel temperature is estimated from ambient temperature, taking into account the effect of irradiance.One of our sources of measured hourly PV output (DTI, see below), provides panel and ambient temperature data for each site, making it possible to derive an empirical relation between the two.This yields a best fit of about 0.025 C W À1 m 2 [21].also give coefficients for free-standing and building-integrated modules.Comparing model error using these two values and our own empirical value, the value for free-standing modules given by Ref. [21] results in the best match with measured data across all sites, as seen in Table 1.As we have no more detailed information about the specific setup of individual sites, we use that value for all simulations as the default assumption.
Additional losses are caused by the PV system's components, primarily the inverter (which converts a panel's DC output into AC power for on-site use or exporting to the power grid), and these are estimated with an additional static loss.In addition to temperature data, the DTI data contain DC and AC output.They therefore allow estimating inverter efficiencies.The mean efficiency across all sites is 0.90, with a standard deviation of 0.04.This suggests a reasonable assumption for inverter losses is 10%, which is used for all simulation results presented here.This is a conservative assumption since the systems in the DTI dataset are about 15 years old, and newer inverters may perform better.
To estimate the total PV output from different European countries, we simulate a PV power plant in each MERRA grid cell (i.e.roughly a 50 Â 50 km grid), and apportion the cells to the given country.For example, this results in 135 grid points for Germany and 102 for the UK.The same grid points are used for the SARAH simulations, ignoring the higher spatial resolution of this dataset.Two types of simulations are run for each of MERRA, MERRA-2 and SARAH: first, a panel with optimal alignment (southwards-facing azimuth and latitude-dependent tilt angle) in each cell, and second, a panel with randomized angles in each cell.The random alignment is produced by drawing from a normal distribution that we find represents the variety seen in real world installations.For the azimuth angle, the distribution has a mean of 180 and a standard deviation of 40 , for the tilt angle, a mean of 25 and a standard deviation of 15 .

PV power output data
Three sources were used to procure time series of hourly power output from individual PV plants, as described in more detail in the supplementary material.Fig. 3 shows how the sites are concentrated in a small number of European countries.This is due to the data sources used and the difficulty of obtaining measured data at a high enough temporal resolution.While the dataset contains entries from a total of 25 countries, the three biggest contributors are the UK (n ¼ 438), Germany (n ¼ 259), and Italy (n ¼ 82).The types of data loggers used at the PV sites for which data was gathered and the measurement error introduced by them was not further considered [33].explicitly states a 2% accuracy for power output readings in the DTI dataset.For the PVLog and PVOutput data, we assume that the loggers conform to the International Electrotechnical Commission (IEC) 61724:1998 standard for PV system data, which stipulates that a logger's accuracy with respect to electrical power should be better than 2% of its reading [22].
National-level data were acquired to perform validation of nationally aggregated simulations.Installed PV generation capacities for European countries were derived by taking the mean from EurObserv'ER [15], IRENA [23], ENTSO-E [13], Eurostat, and BP [6].Annual country-level PV power output data are also available from all these sources except for IRENA.Hourly power output data for the UK, Germany, France, Italy and the Czech Republic were obtained via their Transmission System Operators (TNOs).Finally, hourly demand data for all European countries were obtained from ENTSO-E.More detail on these sources are available in the supplementary material.All time series data from both site-level and national-level sources were converted into the UTC timezone.Thus all times shown in figures in this paper are in UTC.

Analysis of site-level data
Fig. 4 summarizes the mean capacity factors from the site-level data, aggregated to the country level, for those countries with at least 10 available sites.The pattern is more or less as expected, with  a trend for higher capacity factors in the Southern countries.The histograms show the distribution across sites for the three countries with the most measured data.These individual panels appear to be representative of each country's national average capacity factor (presented next), with the notable exception of Spain.As shown in Fig. 3, the Spanish panels are concentrated on the northern coasts where insolation is lowest, and the number of panels (n ¼ 14) is too small to give a statistically representative sample.
The angle at which a solar panel is installed, and whether it tracks the sun or not, can have a significant effect not only on the total annual power production, but also on the shape of the power production curve through the day, as the sun's rays hit the panel at a more or less ideal angle depending on the time of day.A total of 831 measured sites have panel angle metadata associated with them.Fig. 5 a shows their distribution of azimuth angles, with 180 meaning a perfect southward alignment (in the northern hemisphere).While there is a clear tendency for panels to face south, there is some degree of spread.The assumption made for the PVLog data, where no metadata on angles is available, may result in an overestimate of PVLog panel outputs.The majority of the azimuth angle metadata is relatively coarse (therefore, so are the bins in the histogram), since the PVOutput database, which contributes the most sites to the overall dataset, records only compass directions (such as "N" or "NW").For simulating a large number of locations, this implies that simulating a spread of sites around the optimal southwards facing alignment is likely to result in a more realistic output.
There are various methods described in the literature to determine an optimal tilt angle for PV installations, some using only latitude [9], others, a more complex approach including local climatic conditions to account for diffuse as well as direct irradiance [2].In practice tilt angles may often be determined by the roof angle for small-scale rooftop installations, but installations at higher latitudes should generally have steeper angles to better capture the incoming sunlight.As shown in Fig. 5 b, the collected metadata does indicate that higher latitudes have higher angles, although there is considerable spread around the linear regression line.6% of panels with tilt angle metadata have an angle of 5 or lower, so are essentially lying flat.The two vertical dotted lines indicate the range of latitude within which Germany falls.The assumption used for PVLog tilt angles, 35 , corresponds to the median value from all PVOutput panels with metadata in Germany.

Simulating individual sites
We now investigate how well individual PV sites can be simulated by MERRA and SARAH.Fig. 6 shows considerable spread in how well the average capacity factors of different sites are modeled.MERRA and MERRA-2 generally overestimate the site output, which is consistent with the literature [47], while SARAH generally underestimates (when including the 10% inverter loss described above).The fact that MERRA overestimates compared to SARAH is not surprising.We would expect the satellite-derived SARAH to resolve irradiance-relevant weather events that are not properly modeled in MERRA, both because of the latter's low spatial resolution and thus non-consideration of local topography, inaccurate cloud modeling, and in particular, an overestimation of Fig. 4. Average capacity factors from site-level data aggregated to country-level, for countries with ! 10 sites available.The reliability of these estimates is directly related to the number of sites.
atmospheric transparency during clear-sky conditions, for example due to insufficient consideration of aerosols [46,47].What is perhaps more surprising is that at the aggregate scale, uncorrected SARAH seems to be no more accurate than uncorrected MERRA, so for reliable results, simulations based on either of these datasets should use some correction.Furthermore, we note that while the spread of MERRA-2 errors is different, it does not perform substantially better than MERRA.Thus, for the purposes of modeling PV output in Europe, it should not matter much whether one or the other is used.
While both datasets exhibit systematic biases, SARAH models the shape of power output more accurately.Fig. 7 demonstrates the hourly output pattern for MERRA and SARAH for an example site in the Czech Republic, showing how SARAH resolves some events with substantially more accuracy.The figure also shows the flattening effect of including inverter capacity: on the 19th and 20th of May panel output goes above inverter capacity and is therefore cut off in the modeled data, leading to high agreement between modeled and measured time series.
A more systematic investigation of model errors is shown on the left-hand side of Fig. 8, which plots the RMSE 5 for the average daily capacity factor from all simulations (on a scale from 0 to 1), for the three different sources of validation data.While the magnitude of errors is similar across all three sources of measured data, the PVOutput dataset is consistently modeled with the lowest error.While we did not explicitly test this assumption, it is reasonable to assume that this stems from the quality of available metadata, which is most detailed in the PVOutput data.What becomes apparent in this figure is that despite the systematic biases in both MERRA and SARAH, the model error in SARAH is lower, as one would expect given the example data in Fig. 7. Again, it is also clear that the difference between MERRA and MERRA-2 is minor in comparison.
When examining hourly capacity factors for their RMSE, it is apparent that errors are significantly larger, with many sites now showing an RMSE as high as 0.1.We also see that the PVLog data suffers from consistently worse simulation results, as shown on the right-hand side of Fig. 8.This could suggest that the simplistic assumptions used to fill missing metadata may compromise the simulation of hourly power outputs from individual sites.Furthermore, the accuracy of neither MERRA nor SARAH will likely be considered sufficient for detailed studies on the performance of individual sites.One potential source of error is the assessment of panel temperature, and the relative efficiency loss associated with it.As shown by the range of values in Table 1, having better sitespecific information on panel heating behavior at individual sites could improve the site-level simulation results.
Fig. 9 illustrates both seasonal and diurnal output patterns in the measured and modeled data, aggregated for the UK and Germany.The figures show the average daily power production profile for each season, from the mean across all sites in a country, comparing MERRA-2 (leaving out MERRA for clarity) and SARAH simulations against measured power output.From these figures, there is not necessarily a clear advantage for either MERRA or SARAH.SARAH underpredicts particularly in spring and summer in both countries.The comparison to the TNO reported data suggests that the representativeness of the sample of sites is worse in the UK than it is in Germany, as the measured data is substantially lower than the Fig. 6.Histograms of the difference between modeled and measured capacity factors for the three simulation data sources used.A positive value means the modeled data overestimates the capacity factor.The long tail to the right could suggest some panels which are under-performing in the field, hence both MERRA and SARAH overpredict by 15e20% points.This could be due to shading, misconfiguration or downtime, which are unreported and not represented in the model. 5 TNO reported data in spring and summer.Two questions emerge from the results so far.First, are there corrections that can be applied to the MERRA and SARAH simulations to improve their fit with measured power outputs?And second, to what extent do the difficulties with the simulations laid out above have an impact on aggregated time series across wider geographic regions?To answer these questions, we now turn to the simulation of nationally aggregated time series.

Analysis of national-level data
As described above, we use five sources of aggregate nationallevel annual PV power production and installed capacities: EurObserv'ER, IRENA, ENTSO-E, Eurostat, and BP.One problem is immediately apparent in Fig. 10: the different data sources do not necessarily agree on the capacity factors for specific countries.The capacity factors are computed with estimated mid-year installed capacities based on linear interpolation, so to the error can come from the installed capacity data, the power production data, or both.IRENA only provides installed capacities, not production data, so it is not shown in this figure.For comparison, we show the uncorrected average capacity factors from the randomized nationalscale simulations from MERRA and SARAH.In addition, we show the results from Ref. [20]; who use MERRA-based simulations across Europe.It becomes clear that neither the simulations nor the measured data agree, except in the case of Germany.Reasons for this likely include the relatively recent rapid growth of PV leading to inaccurate statistics for both installed capacity and power production, and furthermore, the small-scale nature of much PV deployment making accurate statistics more challenging to produce in any case (in contrast to most other types of power generation, including wind power).

Simulating national fleets
Fig. 11 shows the uncorrected simulated annual mean PV capacity factor across Europe from 1985 to 2014.It is clear that SARAH, due to its significant amount of missing data in particular prior to 1995, cannot deliver long-term consistent time series as readily as MERRA.The approach taken here to fill these gaps, which is to take data from neighboring years and adjust them to account for inter-year differences, implies the loss of overall consistency of the time series when long such missing data periods are filled.Thus, while SARAH is more accurate on an hourly basis, it seems that MERRA is more suitable for long-term studies (which is what reanalyses are intended for), at least in absence of more substantial pre-processing work on SARAH to clean missing data.For applications described below, we therefore use the MERRA-2 simulations.
Fig. 11 shows a picture consistent with the results from validating individual sites above: MERRA generally predicts higher capacity factors than SARAH.In order to correct for this, a first approach is to apply a single correction factor across Europe.Based on the systematic biases as shown in Fig. 6, we find multiplicative scaling factors such that each national-scale simulation is raised by the amount in absolute percentage points we found the sitespecific simulations to be off on average (see Fig. 6).SARAH, for example, under-predicts CFs by 0.011, and the resulting correction factor for both random and optimal simulations is about 1.098.The correction factors are listed in the supplementary material.The  supplementary material also describes an alternative correction approach using linear regression for countries where hourly reported TNO data are available, but we find that this does not lead to significant overall improvement and so for the remaining applications presented here, the Europe-wide mean corrections are applied.Further work and more data on measured PV output will be required to better determine the temporal and country-specific biases.
In addition to long-term accuracy of trends, for countries where we have hourly time series on the nationally aggregated PV output reported by the TNOs (the Czech Republic, Italy, France, Germany, and the UK), we can examine how well our simulation replicates these data.Table 2 shows the results from this.We see that biascorrecting the simulations reduces the error in both randomized and optimal simulations, and that the randomized simulations are superior to the optimal ones.This is what we would expect: simulations with randomized orientation should match real-world output better than simulations where all panels are aligned perfectly optimally, given that real-world installations are not all optimal.We thus use the randomized simulations for all  applications described further below.
There are multiple problems with simulating nationally aggregated fleets.While we can obtain precise data on the power output from individual sites, we do not know the real output from a national fleet as it is too widely distributed to be centrally metered; nor do we know the exact composition of all the individual sites making up that output.The output reported by TNOs is not necessarily more accurate than our own simulations, for example, the UK output is currently estimated by National Grid, the TNO, by using weather data from a set of representative sites [45].Thus, while we can validate against the outputs reported by the TNOs, those estimates themselves are uncertain.Nevertheless, they are the best estimates we have of national-scale PV production.Fig. 12 shows, aggregated to 7-daily means for better readability, the fit between corrected simulations and TNO-reported data for 2014 in France, for which an accurate estimate of installed capacity (and thus capacity factors) is available.For comparison, it also shows the summed output from the 260 individually measured French sites and their simulations.We see that the 16 individual sites do deviate to some extent from the TNO-reported output, but they give a reasonable representation of the broad trends.
These results suggest initial answers to the questions posed at the end of the previous section.It appears that by applying even a simple linear correction, the fit of simulated to measured data can be improved, and perhaps more importantly, that some of the simulation difficulty is averaged over when aggregating over wider geographic scales.After correction for biases, both MERRA-based and SARAH-based solar simulations aggregated to country-scale or regional energy system models are likely sufficient for many types of energy modeling studies, where the remaining PV power output uncertainty will be just one amongst many other input data and model uncertainties.Interestingly, while SARAH performs better on an hour-by-hour basis and for individual sites, it requires more work to clean missing and erroneous values, and due to SA-RAH's longer-term bias in particular before 1995, MERRA may be the better choice for long-term studies.

Applications
Having several decades of PV simulations of known quality lets us explore the long-term trends and patterns in solar output across Europe.By using simulated data for thirty years, from 1985 to 2014, we can explore its seasonal and diurnal variability with consideration of rare weather events.Analogous figures to the ones presented here are given for nine European countries in the supplementary material.Fig. 13 shows the mean daily capacity factor for each day of the year across the 30 years of simulations for the UK.The median (in black) shows a clear seasonal trend, but there is also considerable spread: even in the midst of summer some days have considerably less than a 10% capacity factor (equivalent to a sunny day in February), which is perhaps of little surprise to British residents.
We further examine the diurnal variability in Fig. 14, first by looking at mean seasonal days for summer (June, July, August) and winter (December, January, February), which shows us just how much lower the median hourly capacity factor is in winter.In the UK, the worst 10% of summer days are still better than roughly 75% of winter days.At the right-hand side of the figure, we examine the day-to-day variability in output, by the capacity factor difference between pairs of adjacent days.As one would expect, they are approximately normally distributed around a central zero point, in other words, we would usually expect the next day to show roughly the same capacity factor as its preceding day.Nevertheless, there is a tail of variability extending beyond 10% points.Given that the median daily capacity factor does not go much beyond 15% even in summer (see Fig. 13), this is a considerable day-to-day change.This highlights the importance of using a solid meteorological basis for capturing the variable nature of solar PV output.
Going beyond the analysis of variability, we can examine the impact of increasing PV deployment on output patterns and on its correlation with demand.For this, we use the hourly demand data currently available to us, which of course completely disregards the possible (and indeed likely) future changes to the shape of demand due to such reasons as electrification of transport and heat or generally shifting consumption patterns [5].With this limitation in mind, Fig. 15 use 2014 demand data.In the top part of the figure for Britain, (a) compares a histogram of hourly PV production in 2014, as reported by the TNO, with hourly reported demand minus PV production (demand net PV).We see that PV changes the net-load distribution by generating at times of both high and low demand,  showing in particular the impact on the system minimum demand.The vertical lines indicate the development of installed capacity since 2010.According to these results, starting from just over 40 GW of installed capacity, Britain will start seeing negative demand from PV production alone (not considering the equallysignificant impact of wind generation).National Grid [14] Fig. 12. Weekly mean capacity factor in France, comparing corrected national-level simulations with TNO reported output (above), and weekly mean capacity factors from all individual sites in France comparing simulations against measured site data and TNO reported output (below).Figures for additional countries in the supplementary material.predicted a strong "turning point" in minimum demand beyond 10 GW of solar, with minimum demand decreasing to 15 GW at 15 GW installed, and 5 GW at 25 GW installed.Our results show a less strong effect, because the 30 years of time series we use show that nationwide PV output never approaches a CF of 1 at the time of minimum demand; hence National Grid used an overly conservative assumption.The lower part of Fig. 15 shows the analogous data for Germany.Germany is much further along with installed capacityealmost reaching 40 GW in 2014, and this much larger installed capacity has a more substantial effect on the shape of the net demand histogram.
To examine the effect of PV on net demand in more detail, we take two example days in winter and summer, found by looking for the maximum and minimum net 2014 net demand, plotting them as "duck curves" [8].Fig. 16 shows, for Germany on top, the gross demand from a winter and summer day in 2014 as a dotted black line.From this gross demand, we subtract various amounts of PV generation to see demand net of PV.The thick black line shows net demand based on 2014 TNO-reported PV output.The colored lines represent 1.5 and 3 times the 2014 installed PV capacity, simulated across all 30 years of hourly outputs for the specific day.These curves examine the range of net demand over those 30 years assuming the 2014 demand curve remained constant.This removes the confounding factors that economic and population growth have on gross demand, but simplifies the fact that demand correlates with irradiance and thus PV output (as do the above figures using 2014 demand).For example, a cold and dark day will have higher heating demand than a warm and sunny dayebut given the available data, this serves as a good estimate of the magnitude and range of effect future PV deployment will have.Perhaps the most important messages from this figure are the dramatic difference between winter and summer, and between years.Even having triple the current capacity (over 100 GW) does not push the winter day's demand much below its existing minimum, still leaving a significant net positive demand in the majority of the 30 years of simulations.The range of results between years is substantial, and will be overlooked when considering only a single meteorological year (e.g.Ref. [8]).Summer net demand in Germany with 114.5 GW of PV could range from a minimum of þ11 GW on a cloudy day to À30 GW on a sunny day.Given that every year will contain a mix of such days (as shown by the wide inter-day variation, Fig. 13); this highlights the widening range of situations the network operator will have to cope with on a day-to-day basis because of solar PV.
The comparison to Britain in the lower part of Fig. 16 draws this out even more clearly.The seasonal difference is less pronounced, but on the winter day, PV production barely makes a dent in net demand.The stark contrast between summer and winter indicates the difficulty facing power systems with high shares of solar PV, and the degree to which other power sources or storage must be available to fill in these net demand gaps.In Germany, with increasing PV deployment, the rate of change of net demand during the summer morning (ramp down) and evening (ramp up) will be higher than ever experienced before.This will stress the physical operation of the system and necessitate more flexible generation as opposed to inflexible baseload generators such as coal and lignite boilers.In practice, the current operation of power systems across Europe includes must-run baseload plants and is ill-designed to accommodate large amounts of flexible generation.In Britain, National Grid (the TNO) says that accommodating more than 10 GW of PV capacity will not be possible without making operation of the transmission system significantly more difficult [34].
Finally, Fig. 17 shows the long-term (30 years) yearly average capacity factors across individual European countries and for Europe as a whole.It becomes clear that the year-by year variation is relatively minor, and furthermore, that this variation is relatively consistent across Europe (see the European mean and the three individual countries on the left-hand side of the figure).The results are more or less consistent with what we would expect for most of the countries, as we can be seen on the right-hand side of the figure.This shows how MERRA-2 does well on the long-term temporal stability, as well as on the spatially averaged percountry mean capacity factors.Again, this tempers the conclusion that SARAH is unconditionally the preferable dataset.See the supplementary material for figures of the other datasets and for a figure comparing the per-country bias between SARAH and MERRA.

Discussion and conclusion
We describe PV power output simulations using meteorological reanalysis and satellite-measured data.After validating the simulation results against a large set of real PV site outputs and several nationally-aggregated time series reported by transmission system operators, we examine the application of empiricallyderived correction factors to correct for systematic bias in the underlying data, and present several applications of these simulations.
The paramount importance of high-quality renewable energy simulation data means that there is high demand for such data, yet, researchers currently have to create ad-hoc simulations, expending significant time and effort to acquire and process reanalysis data or other data sources for these simulations.In order to reduce this duplication of work, a web application called the Renewables.ninja was developed to make the simulations developed here available online for others to use (see Fig. 18).The platform will be able to integrate updated and improved versions of the simulations as they become available, and also makes available the wind simulations described in Ref. [42].An API (application programming interface) provides a well-defined and standardized interface via which other software can interact with Renewables.ninja.
We use the validated PV simulations to examine how increasing PV deployment leads to substantial net power demand changes.In particular, we find that Europe, and other countries with significant solar deployment, can expect fundamental problems with grid integration of solar, as shown for the examples of Britain and Germany in Fig. 16.Even assuming the availability of large-scale storage for daily or weekly balancing, the seasonal balancing problem remains and will likely be challenging to resolve.The availability of long-term simulations with a higher degree of confidence given by validation, as presented here, will be fundamental to gaining a better understanding of these effects and developing technical and economic strategies to address them.We can also conclude that none of the data sources investigated here are ideal.While SARAH represents hour-by-hour events with higher accuracy at individual sites, it contains just as much average bias as MERRA, and offers similar performance when aggregated to country-level.In addition, SARAH requires more effort to clean missing or erroneous observations; MERRA is more consistent on a long-term seasonal basis.In both cases, long-run average spatial calibration is required for accurate results.This is likely to become easier and more nuanced with the increasing deployment of PV and thus the increasing availability of measured panel output data.
The simulations presented here could be improved in various ways.The lack or inaccuracy of site-level metadata was a notable barrier to the simulations we performed.In order to alleviate this, additional data sources for measured data with improved metadata could be included in future validation, or metadata could be inferred, for example by using panel angles from one dataset to draw assumptions for angles at neighboring sites from another dataset.The inclusion of additional forms of metadata in the simulations could also improve results; for example, the consideration of shading effects which systematically change the shape of the irradiance curve throughout the day.The accuracy of the measured data is only known for the DTI dataset, not the other two, so additional validation data with better known accuracy characteristics may improve the results.Yet measurement accuracy in the validation data likely is a minor source of error compared to lacking metadata and the input data used for the simulations.Indeed, the widest gain would come from improved input data.The use of meteorological reanalysis for solar power simulation is still a recent development, and future iterations of reanalysis models could take this new use into account and improve aspects relevant for it, such as the more detailed consideration of aerosol measurements [38].Finally, it is not clear to what extent these results can be generalized to other world regions and to other reanalyses and satellite-based datasets.It is generally accepted that some reanalyses perform better in certain parts of the world than in others.This could only be determined by performing inter-reanalysis comparisons and/or by acquiring additional measured data from other parts of the world against which to validate simulated outputs.

Fig. 2 .
Fig.2.Overview of the approach used to model PV power output.

Fig. 3 .
Fig. 3. Locations from which measured PV panel output is available, color-coded to indicate the length of time series available.Number of sites -Total: 1029, PVLog: 200, DTI: 227, PVOutput: 602.).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 5 .
Fig. 5. a) Histogram of PV site azimuth angles, i.e. the direction the panels are facing.b) PV site tilt angles by latitude.There is a slight trend towards steeper angles at higher latitudes; the red line indicates a linear regression.The two vertical dotted lines indicate the range of latitude within which Germany falls.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 7 .
Fig. 7. Simulated and measured hourly power production for a site in the Czech Republic (Approximate coordinates 49.0, 14.7) during six days in May 2014.

Fig. 8 .
Fig.8.Root mean square error (RMSE) for measured against modeled capacity factors across all simulated sites.The capacity factor is a unitless quantity, so its error is also unitless.(a) Daily, (b) hourly.

Fig. 9 .
Fig. 9. Average daily capacity factors from the validation sites, aggregated to the country level for each season, and a comparison to the Transmission Network Operator (TNO) reported outputs for the same countries.Numbers in parentheses are the seasonal means for simulated (S) and measured (M) data.Figures for additional countries in the supplementary material.

Fig. 10 .
Fig. 10.Observed capacity factors from sources reporting installed capacity and production data compared to our uncorrected model results and prior work.The left y-axis shows mean annual capacity factors, while the right y-axis shows the annual output (in kWh) per installed capacity (in kW), which is equivalent to the number of annual full load hours.

Fig. 11 .
Fig. 11.Uncorrected annual mean capacity factor (European mean).The dotted lines indicate the interannual mean, which is also given in parentheses in the figure legend.

Fig. 13 .
Fig. 13.Daily capacity factors in the UK from corrected hourly simulations for 30 years (1985e2014).Figures for additional countries in the supplementary material.

Fig. 14 .
Fig. 14.Diurnal variability of PV capacity factors in the UK from hourly simulations for 30 years (1985e2014).Figures for additional countries in the supplementary material.

Fig. 15 .
Fig. 15.Correlation of demand and PV output in Great Britain (top) and Germany (bottom).(a) Comparison of 2014 TNO-reported hourly demand and hourly PV production.(b) Minimum demand from hourly simulations for 30 years (1985e2014) against 2014 hourly demand data.Figures for additional countries in the supplementary material.

Fig. 16 .Fig. 17 .
Fig. 16.Electricity demand in Germany (top) and Great Britain (bottom) in 2014 net of different installed PV capacities.The minimum and maximum net demand days in 2014 are chosen, the black dashed line represents gross demand, and the thick black line the net demand with 2014 installed PV capacity.The thick colored lines are the median across 30 years, while the two lighter shades of each color indicate the 25%e75% and the minimum-maximum range.Figures for additional countries in the supplementary material.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 1
Comparison of root mean square error (RMSE) between different temperature-irradiance coefficients, across all individually modeled sites.

Table 2
Root mean square errors (RMSE) for country-level simulations, comparing MERRA-2 and SARAH optimal and randomized runs.