Dealing with multiple decades of hourly wind and PV time series in energy models: A comparison of methods to reduce time resolution and the planning implications of inter-annual variability

Using a high-resolution planning model of the Great Britain power system and 25 years of simulated wind and PV generation data, this study compares different methods to reduce time resolution of energy models to increase their computational tractability: downsampling, clustering, and heuristics. By comparing model results in terms of costs and installed capacities across different methods, this study shows that the best method depends heavily on input data and the setup of model constraints. This implies that there is no one-size-fits-all approach to the problem of time step reduction, but heuristic approaches appear promising. In addition, the 25 years of time series demonstrate considerable inter-year variability in wind and PV power output. This further complicates the problem of time detail in energy models as it suggests long time series are necessary. Model results with high shares of PV and wind generation using a single or few years of data are likely unreliable. Better modeling and planning methods are required to determine robust scenarios with high shares of variable renewables. The methods are implemented in the freely available open-source modeling framework Calliope. Copyright © 2017 The Author. Published by Elsevier Ltd. Open Access Article licensed CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)


Introduction
Energy system models were first developed in the 1970s by the International Energy Agency (IEA) and the International Institute for Applied Systems Analysis (IIASA) in the aftermath of the international oil crisis. Using optimization methods, in particular linear programming, they allowed analysts to structure their assumptions and data, forming them into internally coherent scenarios of how energy is extracted, converted, transported, and used, and how these processes might change in the future. Today, with the increasing deployment of variable renewable generation, the global energy system is again undergoing a fundamental transition. Global installed wind power capacity reached about 417 GW in 2015, up from 17 GW in 2000, while solar photovoltaics (PV) has experienced an even higher growth rate, with capacity rising from below 1 GW in 2000 to 222 GW in 2015 globally [1]. Energy models are important decision-making aids to help navigate the transformation of the current fossil-fuel based energy system to one based on clean and renewable energy [2].
In this context, the rising importance of variable renewable generation has presented two crucial and related problems to energy modelers. The first problem is procuring data on the generation potential for wind and PV power with sufficient resolution in space and time, then integrating this data into power system models such as LIMES-EU [3] or larger energy system models such as TIMES and TIAM [4]. Having data with temporal resolution of one hour or better allows a model to depict the hour-by-hour and day-by-day fluctuations in power output from these technologies, how they correlate with each other and with power demand [5]. The second problem pertains to the interannual variability of renewable generation, which requires many years of data to fully address. Recent work has started to address the first problem but in the main, studies are limited to a single or a small number of years [6][7][8][9]. The provision of longer time series requires input data ideally spanning multiple decades. This is becoming possible through the use of global reanalysis data for energy modeling [10,11].
Both of these problems overlap in one crucial area: for large models to be computationally tractable, it is often not feasible for them to include full hourly time series for an entire year, let alone for several decades. This is primarily due to the computational requirements of running what are often linear or mixed-integer optimization models. Work has started to emerge investigating ways to reduce the number of time steps in models while retaining relevant detail, primarily by using statistical clustering methods to derive a set of typical days or weeks from a larger input time series, then feeding those days into an energy model [12][13][14][15]. However, there are well-known limitations to statistical clustering methods, including the fact that most methods produce clusters even in homogeneous datasets, and that clusters must be validated and their stability assessed [16]. This implies that approaches may vary in performance in different years, and that specific approaches are more or less suitable depending on the structure of the underlying model. This paper treats this problem systematically, by examining different techniques to reduce the time resolution of energy models and their impact on model performance and results, and by doing so, answers two questions. First, how accurate are different methods to reduce time resolution when compared on the same model and with different model configurations? Second, how can time resolution be reduced in the most efficient way while maintaining scientific accuracy? The rationale when reducing the number of time steps is to balance improved computational performance with model accuracy. This study compares different approaches to achieve this, including downsampling, statistical clustering methods, and heuristic selection of specific days and weeks. Wind and solar generation show significant inter-annual variability, so multi-decade time series data should ideally be used to represent the full range of this variability [11,10]. This further complicates the answers to the two questions posed above: when considering power systems with very high shares of variable renewable generation, it is necessary to also consider whether the differences in accuracy between different time resolution reduction approaches persist across different years or when examining multi-decade time series. The analysis is performed with a model of the UK power system based on the open-source Calliope high-resolution modeling framework [8]. Table 1 gives a stylized overview of the ways in which temporal resolution has been included in energy models in order of increasing detail (spatial resolution is included for completeness, but this paper focuses on temporal resolution). With the displacement of traditional power generation by variable renewables expected to increase, energy modelers have been moving downwards in this table from lower to higher resolutions. Approaches such as average availabilities for technologies, which were sufficient when modeling baseload or completely dispatchable generators such as coal or nuclear power, have been replaced with more explicit treatment of time. However, this more explicit treatment comes at a computational cost. Assuming a model with a single year of 8760 hourly time steps, 20 technologies (such as wind generation or electric heating), 20 locations and 5 time-dependent constraints (such as maximum power generation per location, storage charging, and discharging), more than 17 million total constraints would result. Reducing such a model's size by one or two orders of magnitude by reducing the number of time steps brings with it a concomitant reduction in computational complexity, and thus in required CPU time and memory requirements to solve it.

Background
There are broadly two ways to explicitly include temporal detail in energy models without including full time series data. The first is time slices, that is, a reduced set of time steps chosen to characterize key aspects of temporal variability, for example by covering weekdays and weekends, different times of day, and different seasons. For example, four seasons with four times per day for both weekdays and weekends would result in 32 time slices. Large energy system models such as TIMES generally use time slices e.g., [19,20]. The second way is representative of typical days (or longer time periods) extracted or artificially constructed from full time series. Typical days are an intermediate step on the path towards full time series: selecting a number of specific days from the data covering as much variability as possible, or constructing synthetic days by clustering the data. In both cases the goal is to preserve the relevant statistical properties of the time series and thus minimize impact on model results.
Two problems arise when reducing the resolution of model input time series: concurrency and continuity. It is important that correlation between events is realistic in an energy model, for example, a stormy winter day may induce higher heating demand, reduce PV power output to almost zero, but provide above-average wind power production. A concurrency problem arises when a subset of typical heat demand days are mixed with a subset of typical wind production days and such correlations are lost. This can be circumvented by ensuring that an internally consistent set of input data is used across the entire model. The second problem is that of continuity and appears when there is a state in a model that needs to carry over from one time step to the next. The prime example of this is the state of charge of storage facilities. Ensuring continuity between time steps can be difficult to address when picking representative days, so recent approaches have often used groups of consecutive days [14]. Selecting days should take into consideration a statistical measure of representativeness, as a selection based on a typical (e.g. seasonal basis) has shown to be inferior for high shares of variable renewables [21].
Statistical clustering is a way to group samples into groups called clusters, such that the similarity of samples within a cluster is higher than between clusters. It can be used to select representative time periods, and has been applied in particular when studying demand profiles and to estimate load profiles where data is limited. Rhodes et al. [22] use the k-means clustering technique to group homes with similar hourly electricity demand profiles and use regression to determine which variables influence demand in a cluster. Similar clustering approaches for electricity demand were also used in Räsänen et al. [23] and McLoughlin et al. [24]. Green et al. [12] use k-means clustering for power demand in a mixed-integer Great Britain (GB) power system dispatch model written in GAMS, for one year at a time over the 12 years in the period 1994-2005, reporting no more than around 1% model error in system-wide power costs but a model speedup of a factor of about 60 when using their clustered demand data. Clustering has also been used for more than just demand data. Heuberger et al. [15] use the k-means clustering approach reported in Green et al. [12] for demand and for wind and solar production in a costoptimizing mixed-integer GB power system planning model, reporting about a 0.6% error in system-wide cost results and a 4% error in technology-specific costs when comparing to unclustered data. Baringo and Conejo [25] compare the use of load duration curves and k-means clustering for an efficient representation of the spatial correlation between wind generation and power demand for investment planning purposes, finding that k-means clustering is more accurate than load duration curves, as it can depict the correlation between different locations. Mena et al. [26] use hierarchical clustering to speed up a differential evolution algorithm for the optimal design of a distributed renewable generation system. Similarly, Nahmmacher et al. [14] use hierarchical clustering of time steps using input data from the ERA-Interim reanalysis for 1979-2011 for the long-term European power system planning model LIMES-EU, a cost-minimizing linear optimization model. They find that 6 representative days covered by 48 3hourly time slices are sufficient when comparing to 100 representative days, with total system costs just 4% lower.
Clustering is not the only possible method to reduce time resolution and other work has taken different approaches. For example, Poncelet et al. [13] present an optimization approach to selecting time steps. Samsatli and Samsatli [27] hierarchically decompose the complete time series into slices of different granularity, in a refinement of the time slice approach used in models like TIMES. Hsu [28] summarizes recent work on clustering to identify groups of electricity consumers, and is the only recent study to assess two different methods with respect to prediction accuracy and cluster stability: k-means and clusterwise regression. He finds that kmeans clustering results in relatively stable clusterings, but poorer predictive accuracy compared to clusterwise regression, concluding that given this varying accuracy, appropriate clustering methods must be identified for particular use cases. Parpas [29] argues that for energy systems, reducing a problem to typical states, such as an average winter evening demand, is an oversimplification, and that by using typical states, a model ends up underestimating the energy system's true costs. On the other hand, Ludig et al. [30] showed that with higher time step resolution, a longterm energy model for part of the German power grid computes higher overall system costs, but the effect of this on climate mitigation costs was negligible. One can conclude that the necessity to use hourly rather than lower-resolved (6, 12, and 24-hourly) time series data when considering high shares of renewables depends on the research question being answered.
While there is a recent wealth of work using various methods to reduce time resolution in energy models, this work leaves open the question of appropriateness of different methods for different model configurations and whether some approaches could achieve accuracy with less detail. Most past work does not motivate the choice of a specific method of reducing time resolution versus the many available options. There are therefore two open questions. First, how accurate are different methods to reduce time resolution when compared on the same model and with different model configurations? Second, how can time resolution be reduced in the most efficient way, balancing accuracy and data? Also, most past work has not assessed systems with very high shares of variable renewable generation, but such systems are vulnerable to the significant inter-annual variability of the renewable resource [10]. Thus it is also necessary to consider whether differences in accuracy between time resolution reduction approaches persist across different years or when examining multi-decade time series. This final point leads from modeling problems (dealing with data volume and model size) to planning problems (planning power systems with high shares of renewables given the inter-year weather variability and resulting wind and PV power variability).

Model and data
This analysis uses a cost-minimizing linear optimization model that can determine installed capacity, power plant dispatch, transmission line capacity and use between model zones using net transfer capacities, line losses, storage capacity and charging/discharging, based on the Calliope energy modeling framework [8]. The model determines installed capacities while simultaneously establishing the hour-by-hour operation of all plants. In all model runs, plant capacity is decided simultaneously for all units without consideration of different planning horizons or deployment over time. The basic structure of Calliope was derived from the power nodes modeling framework [31] and uses a model formulation based on nodes defined by a set of technologies and locations, with nodes able to supply, store, transmit, or demand energy depending on the constraints specified by the defined technologies. The model balances electricity supply and demand across the modeled system. The structure is described in more detail in Pfenninger and Keirstead [8] and in the Calliope documentation. 1 Calliope reports levelized costs of electricity (LCOE) for each technology by accounting for both construction and operational costs. System-wide LCOE are computed by a production-weighted average of all technologyspecific LCOEs, including those of transmission and storage technologies. The objective function is to minimize total system cost, so all decisions, including power plant dispatch or storage operation, are taken from the perspective of a central planner with perfect information.
The model used for this study is based on a modified version of the 20-zone GB power system model from Pfenninger and Keirstead [8] (see Fig. 1). The zones are based on the National Grid transmission system [32], with transmission constraints between the zones derived from data on power flows across their boundaries [33]. The technologies considered are onshore wind, offshore wind, PV, hydro, pumped hydro and battery storage. To simplify the model used, non-renewable generation is represented by two technologies: baseload (with higher capital costs and lower operating costs) and dispatch (with lower capital but higher operating costs). No capacity additions for hydro or pumped hydro are allowed. These simplifications allow focusing on the key characteristics of interest for this study: high shares of wind and PV power and the effect of different methods to reduce time step resolution on model results.
The Calliope framework was designed for high resolution data in space and time. This analysis uses high resolution in time, with 25 years of hourly time series data for wind and PV generation. These data are obtained using the extensively validated Renewables.ninja PV and wind simulation methods [10,11]. Wind power Table 1 Main approaches to include temporal detail in energy models with an indication of the order of magnitude of resolution in time. LDCs = load duration curves. Time step resolution is typically hourly or higher for time slices, typical days, or full time series.  [18] is obtained with the Virtual Wind Farm model [11] by extrapolating wind speeds from the NASA MERRA reanalysis [34] to a hub height of 80 m, then simulating power production using aggregated power curves of the UK's five most common turbine models [35]. 1 MW offshore and onshore farms are simulated at each offshore/onshore MERRA grid point [8] and the resulting time series are averaged to the 20 model zones. PV production is calculated with the Global Solar Energy Estimator model [10], which uses an indirect irradiance estimation based on Ridley et al. [36] and a PV performance model based on Huld et al. [37] to simulate PV production from southwards facing panels tilted at 35 degrees in each onshore MERRA grid point. Demand data from National Grid for the years 2002-2013 are included, based on 30-min National Grid data [38], and disaggregated spatially into the model zones by using total annual demand data per local authority and assigning each local authority to one of the 20 zones. This assumes that demand shape is the same in each zone. For years before 2002, where no demand data is available, 2002 data is used. The complete model code and data are made available online (see supplementary material).

Methods to reduce time resolution
For the work presented here, the Calliope framework was extended with implementations of all the time resolution reduction methods compared. Time series data is stored at 1-hourly resolution, and any adjustments are performed in-memory based on specifications in Calliope's model configuration before the model is solved. This makes it possible to reproduce the changes made to input data and to adjust those changes for future model runs without losing the higher detail contained in the input data. Table 2 gives an overview of the different approaches used to reduce time resolution. The most simple approach is downsampling, where the entire time series is simply downsampled to a lower resolution (e.g. from 1-hourly to 3-hourly). Heuristic selection is the selection of days or full calendar weeks based on criteria such as the week containing the maximum or minimum daily average for the given time series, or the relative difference between time series being maximal or minimal. Two clustering methods are compared: hierarchical and k-means clustering, described in more detail below.
These methods are combined in various ways to yield a total of 42 time resolution reduction approaches, some of which are single methods, others a combination of multiple methods. A full list of the approaches and their parameters is given in Table 3. These approaches are chosen to test a wide range of methods on the same model. They are assessed for their ability to deliver accurate results with as few time steps as possible. The process of applying any of the 42 approaches is as follows. Before applying clustering or any of the heuristic methods, time series are normalized by the maximum value across all time steps and model zones (for downsampling, no normalization is performed). If a heuristic selection is applied, the remaining (unchosen by the heuristic) data can be either removed, downsampled or clustered, depending on the configuration of the chosen approach. The clustering methods are described below. After applying downsampling, heuristic selection, clustering, or a combination of these, a weight is determined for each remaining time step according to the number of original time steps it represents (i.e. a downsampled 6-hourly time step gets a weight of 6). The final time series is then re-scaled to match the mean of the original time series. This overall approach is similar to that presented in Nahmmacher et al. [14].
Two clustering methods are compared. The first is hierarchical clustering using Ward's method as in Nahmmacher et al. [14]. The clustering algorithm determines clusters by recursively starting to group observations together based on Euclidian distance, stopping once a pre-determined maximum distance or a predetermined number of clusters is reached. For this study, it is always the number of clusters that was fixed, as outlined below, and to match the same number of clusters as used for the second clustering method, k-means. Clustering with k-means also uses Euclidian distance, but with a heuristic algorithm. To assess the stability of clusters given the initial random selection of cluster centers in the k-means algorithm, the algorithm was run 1000 times for 5 clusters using hourly 2014 data. The resulting cluster labeling was compared with the chance-adjusted Rand Index [39] and Adjusted Mutual Information [40], both of which had a mean of 0.90 and a standard deviation of 0.041 and 0.039 respectively, suggesting that clustering is sufficiently stable.
Examining the sum of squared error (SSE) between the cluster centroid and all cluster members as a function of the number of clusters (the elbow method) to test 1-30 clusters on the 2014 data, between 10 and 15 clusters was found to be the point where SSE flattens off. 5, 10, 15, and 20 clusters were therefore selected for the full analysis. In addition, the impact of the normalization used was assessed by comparing load duration curves (LDCs) for 5, 10, 15, and 20 k-means-derived clusters for the 2014 data. An LDC is a cumulative frequency diagram with an inverted x-axis, showing the distribution over all time steps of capacity factors. Capacity factor is the ratio of power generation to hypothetical power generation if running at full nameplate capacity over a period of time, so a wind capacity factor of 0.6 means that wind is generating at 60% of its nameplate capacity. LDCs are often used in power system analysis to visually summarize the availability of a generation technology over an entire year. The R-squared of the original, 1-hourly LDC compared to non-normalized data was 0.64 (standard deviation 0.084), and 0.91 compared to normalized data (standard deviation 0.019), suggesting that normalization delivers significantly more accurate clustering. To more fully judge the impact different approaches to reduce time resolution have on model results, additional metrics are used below. For both clustering approaches, the observations on which clustering is performed are all time steps for a given day and across all model regions, for four variables: onshore wind, offshore wind and PV generation, and demand. That is, the Euclidean distance is calculated by reshaping the data from a 3-dimensional time steps Â locations Â variable matrix to a 2-dimensional days Â n matrix, where n contains time steps per day, locations, and variables. Furthermore, different numbers of clusters and two choices for determining typical days are compared: cluster centroids (means) or the real day closest to the cluster centroid, from all days contained in a cluster. The closest real day is found by the minimal Euclidian distance between each real day and the cluster centroid, computed with the Frobenius matrix norm. Both clustering methods use their implementation in the SciPy Python package, 2 and the implementation of all time resolution reduction approaches is available in the Calliope source code. 3 To give an example, the method labeled ''min/max solar and wind days -kmeans -10 days -closest" would first pick the days with maximum and minimum PV and wind generation (up to 4 days), then remove them from consideration and apply k-means clustering to the remaining data, grouping it into 10 clusters, then selecting the closest real days to each cluster centroid, resulting in up to 10 additional days for a total selection of up to 14 days. Table 3 lists all the compared time resolution reduction approaches and the parameters used. In the case of the clustering methods, the derivation of actual days is one parameter (either by selecting cluster centroids/ mean values, labeled ''mean", or selecting the day from within a given cluster that is closest to the cluster centroid, labeled ''closest"), and the number of clusters to look for is the second parameter. The heuristic methods with ''drop" in the selection column simply drop all non-selected time steps, i.e. in the case of ''min/max wind weeks", all but the weeks with maximum and minimum wind output are dropped.

Model runs and analyses performed
First, to test the 42 approaches, each one is applied to a year (2014) of hourly data, therefore simulated renewable generation and reported demand data for that year are used. The resulting models with reduced time resolution are run for three different scenarios for each time resolution reduction approach. The three scenarios are defined by varying the constraint that controls the amount of variable renewable generation: (1) averaged over the entire modeled time period, a minimum of 50% of power must be supplied from PV and wind (called the ''50% renewable generation" scenario), (2) a minimum of 90% (''90% renewable generation"), and (3) a minimum of 90% with the possibility of deploying a large amount of battery storage (''90% with storage"). Unless in this last case, the primary means of balancing variable renewables available to the model are deploying the generic dispatchable and baseload technologies. The deliberate lack of options to balance variable renewable output means that these scenarios are difficult to achieve and result in high levelized system costs, so they are not representative of real power system configurations, but are useful test cases to compare the effect of time resolution on models with both a 50% and 90% share of renewable power generation.
Second, a smaller set of time resolution reduction methods is applied to the full 25 years of data. Table 4 lists these eight approaches. The number of clusters (days) selected is higher here than in the methods applied to a single year only, in order to attempt to capture inter-annual variability. In addition, a different heuristic is used to select a single or two extreme days for each year of data, such that a representation of all annual extremes is available. ''Extreme days" here means for each of wind generation, PV generation and power demand, the day per year with the maximum and the minimum mean value.
The primary goal when reducing time resolution is to improve computational tractability of the model, while retaining as much detail as possible. The metric for computational tractability is the number of time steps. The CPU time required to solve a model is also compared. As it is machine and solver dependent, it is normalized by the maximum CPU time over all runs. The metrics for examining the performance of a method are the percent deviation from the reference (1-hourly) model of system-wide LCOE and of installed technology capacities. In addition, the efficiency of an approach is determined by 1=ðn t Ã absðCÞÞ, where n t is the number of time steps and C is the relative deviation of LCOE from the 1-hourly reference case. The efficiency is used to rank all time resolution methods compared, and to choose the five most efficient ones for further inspection in the sections below. Other measures to assess the stability and suitability of clustering are possible, for example information-theoretic ones e.g.; [41]. However, here we want to look primarily at the optimization model outcomes, since those are the quantities of interest.

Downsampling
We first examine uniform downsampling of the input time series to 3-hourly, 6-hourly, 12-hourly and 24-hourly time steps, for the year 2014. Fig. 2 compares the effect of this downsampling on CPU time required to solve a given model, and the resulting optimal wind generation capacity. CPU time is normalized by the maximum reported CPU time since the absolute time is computer-dependent. The figure shows a significant reduction in normalized CPU time with each reduction of time resolution. The reduction in CPU time is most noticeable in the ''90% with storage" scenario, since that model has more complex constraints with respect to time by including storage charging and discharging. Time resolution influences the system configuration found by the model considerably, as seen in the figure by the amount of installed offshore wind capacity. Since the optimization finds minimal cost configuration and is forced to match supply with demand, averaging renewable generation and demand time series over ever longer time periods results in lower installed capacities. This is because as demand peaks and production troughs are smoothed, it becomes easier for the model to fulfill the balancing constraint. The availability of storage in the system in the ''90% with storage" scenario reduces the difference in installed wind capacity between time resolutions, since the smoothing of output and demand fluctuations now take place through storage. These results are a first indication that the appropriate method to reduce time resolution depends heavily on the model setup and constraints used.
One issue with downsampling of input data is the loss of intratime step variability, which, as Fig. 2 shows, has a particularly strong effect on results from models with high shares of variable renewable generation but little storage. Fig. 3 compares model results from a 6-hourly and a 1-hourly run, for the first seven days in June 2014. Since the model used optimizes both plant deployment and dispatch simultaneously, the loss of detail through smoothing of peaks and troughs (for example on June 5th) can strongly affect model results. The loss of detail seen in this figure explains why downsampling the input data leads to a reduction of installed capacity in the ''90% renewable generation" scenario. It removes the need for peaking capacity during particularly nonwindy or non-sunny hours, since they are simply smoothed out in the time series. This effect is not as strong in the ''50% renewable generation" case given that 50% of demand is covered by nonvariable generation.

Clustering and heuristic selection
We now examine more complex methods to reduce time resolution and whether they can offset some of the undesirable side effects of downsampling. We first examine the effect these methods have on the structure of the demand and wind/PV generation time series for 2014 data, by using LDCs and the correlation between wind and PV. Table 5 shows the configurations we analyze more closely: the two methods with the highest efficiency (labeled 1 and 2, both combinations of individual methods), a kmeans and a hierarchical clustering method for comparison (labeled 3, 4), and 6-hourly downsampling (labeled 5). Table S1 in the supplementary material contains the full list of all tested approaches. The efficiency measure used to select methods 1 and 2 is as defined in the methods section: the inverse of number of time steps multiplied by the relative deviation of model-reported LCOE from the 1-hourly reference case, in other words, a measure of how well a time resolution reduction method can approximate the 1-hourly reference case with as few time steps as possible. The selection of the most efficient methods will be discussed in more detail in the next section. Fig. 4 shows load duration curves for offshore wind and PV for 1-hourly 2014 data, for the five methods listed in Table 5. The actual time series always contain less than 8760 time steps, but to draw the figure, each clustered time step gets replicated along the x-axis according to how many actual time steps it represents. Uniform downsampling of the time series to 6 h does not substantially affect the shape of the load duration curve for wind, whereas for PV, a 6-hourly resolution is insufficient to capture the diurnal solar cycle. The shape of the PV load duration curve is therefore skewed towards more hours with lower capacity factors.
Similarly, different time resolution reduction methods affect the shape of the load duration curve by emphasizing certain parts of it over others depending on the criteria used to choose the time steps. The method labeled number 2 in the figure (''min/max wind-demand weeks -drop") picks the weeks with the minimum and maximum difference between wind power output and demand, but it does not replicate the LDC of PV well. The extreme ends of the load duration curve are of potential interest as they represent periods of very low or very high production. For the wind time series, both the k-means and hierarchical clustering approaches (labeled 3 and 4) flatten the LDCs substantially, that is, they remove much of the extremes and skew the LDC towards the average. In the case of PV, the LDCs from k-means and hierarchical clustering generally lead to an overprediction of PV capacity factors throughout the curve. The figure also shows that the some of the methods considered here skew the correlation between wind and PV generation quite significantly compared to the original 1-hourly data (shown in black).

Model results and efficiency of different approaches
Above (Fig. 4), the method with the highest efficiency is a combination of selecting extreme days heuristically and clustering the remaining time periods (labeled 1 in the figure). Because this method represents LDCs and the PV-wind correlation worse than other methods, it is necessary to compare model results to judge the performance of different approaches.
Two model results are of particular interest: the deployed capacity of key technologies and the levelized cost of electricity (LCOE). Fig. 5a shows the installed capacity across the three different scenarios, for all 42 examined time reduction methods. As above, these results are from the 2014 model. Three key technologies are shown: offshore wind, PV, and the dispatchable fossil technology. We see that the range of installed capacities for offshore wind ranges from just above 40-170 GW. It should be noted that a small number of time resolution reduction approaches result in more wind capacity than the 1-hourly reference case (shown with the thick line). In the case of PV capacity, this effect is even more noticeable. These are methods that choose extreme days only and disregard the rest of the time series data. If we run a capacity deployment model for only days with extreme wind conditions, the model may well deploy substantially more wind generation than necessary. Thus, these methods are simply not useful in practice when modeling very high shares of variable renewable generation. Fig. 5b shows the system-wide LCOE (generation-weighted average LCOE across technologies) computed by the 2014 model.   The LCOE range is significantly smaller for the ''50% renewable generation" scenario. This is because in this scenario, the comparatively low share of variable renewable generation and conversely the higher share of dispatchable generation means that weather events, particularly extreme events for either wind or PV generation, have a lesser effect on the optimal model solution. The choice of time step resolution reduction method therefore plays a lesser role, and the range of results is narrower. Allowing for additional storage capacity in the ''90% with storage" scenario also reduces the range of costs, by providing an alternate means for balancing variable generation and thus also reducing the importance of weather variability for the model solution.
Overall, it appears that the method to reduce time resolution is of less importance unless very high shares (on the order of 90%) of variable renewables are modeled with little backup or balancing possibilities. From amongst the complete 42 methods, we now select those five time resolution adjustment methods with the highest efficiency (as defined in the methods section) when applied to the 2014 model. Table 6 shows these methods. The two topmost have already been used in the above section when examining LDCs and correlation between wind and PV. A similar table with results for all examined methods and for the ''50% renewable generation" and ''90% with storage" scenarios can be found in the supplementary material in Tables S2-S4. While Fig. 5 showed that some of the 42 methods perform inadequately on the accuracy of both installed capacities and levelized costs, Table 6 shows that certain approaches can achieve values very close to the 1-hourly reference case even with an order of magnitude fewer time steps. These results suggest a high time resolution is important for models that analyze systems with high shares of variable renewable generation. However while there is a trade-off between accuracy and the number of time steps in the model, the relationship is not linear. Significant model complexity gains can be made by careful selection of time steps while still retaining high accuracy in the results.

Variability of wind and PV generation
Examining the results from just a single year leaves open the question of whether a method proven as most viable in that single year is equally suitable for other years. Fig. 6 shows the interannual variability from 25 years of PV and wind simulations for the UK, from the simulated data used for the model runs presented here. There are seasonal trends for both wind and PV generation, but also considerable inter-annual variability. A sunny day in winter can be almost as productive for PV output across the UK as a cloudy day in summer. The average UK wind capacity factor can range from close to zero to almost one on the same calendar day across the 25 years of simulations shown. This implies an inadequacy in using merely a single year of data. When examining multiple years of data and time series longer than a single year, two distinct problems emerge. The first is a modeling problem and requires the application of the time resolution reduction approaches described above to more than a single year of data. The second is a planning and policy problem. Despite the inter-year variability in the renewable resource, power systems with high shares of variable renewables will likely become a reality, so energy modelers must propose solutions for their stable operation.

Variability impact on time resolution reduction methods
To examine the modeling problem, Fig. 7 shows the percent deviation from the one-hourly reference model runs for the system-wide LCOE, for three different time resolution reduction methods, over five years of model runs. A negative deviation means that the method is underestimating the LCOE relative to the 1hourly model runs. The three methods were chosen to illustrate the wide range of behavior as follows. Method A (min/max solar and wind days -downsample 6 h) shows the lowest mean absolute LCOE error across the five years, method B (min/max wind-demand weeks -drop) has the highest mean efficiency across years, and method C (k-means -5 days -closest) is chosen as an illustrative example of an approach with clustering only. This last approach does well in some years, but leads to substantial errors in other years. Thus, by validating a clustering method based on just a single year of data, one may erroneously conclude that it is more robust than it really is. Method A (min/max solar and wind days -downsample 6 h) performs relatively consistently across all five years, most likely because it heuristically selects the extreme events that drive model results. However, this method is quite far from being efficient: it requires 1580 time steps. Method B, with only 168 time steps, is substantially more efficient computationally, but does not show the same consistent behavior over the five years compared.
If we repeat the analysis shown in Table 6 above for the year 2012, as shown in Table 7, the resulting list of approaches is different. However, a consistent pattern is that methods with heuristic selection of the maximum and minimum solar and wind days (''min/max solar and wind days") are amongst those that work best.

Model planning results over multiple decades
The variability of input data across multiple years translates into significant differences in model results. Fig. 8 shows the range of installed generation capacities when running single-year models for each of the 25 years, for the ''90% renewable generation" and the ''90% with storage" scenarios. The considerable range of installed capacities casts doubt on model results using only a single  or a small number of years of data. While allowing additional storage (Fig. 8b) reduces the absolute capacity as well as the range somewhat by providing an option to balance weather variability, the model results still show a considerable range of capacity. Fig. 9 shows installed offshore wind and PV capacities from each of the 25 years separately, for the ''90% renewable generation" scenario, sorted by wind capacity. Wind capacity is relatively equally distributed throughout the range spanned by the 25 years.
The year 1991 could be considered an outlier, and discarded as an exceptional extreme condition, or it could be considered a condition that energy planning needs to address. Even when discarding this outlier, the remaining years span a 30 GW range of difference in installed capacity however. By breaking out installed offshore wind capacity into the different model zones, across the 25 years, as seen in Fig. 10, we see that this variability stems from only a few model locations. For robust capacity planning, therefore, these locations in particular would be of interest.
Finally, we examine the results from applying different time resolution reduction methods to the full 25 year time series (see Table 4 for the methods used for this). Fig. 11 shows the results in a similar manner as the boxplots of deployed capacity above, but additionally showing as dots the individual results contained within the boxplots. The range of installed generation capacities is even larger than the range from running individual years, which would be expected given that some of the time resolution reduction methods specifically select (and thus focus on) extreme events from the data. For the purposes of robust capacity planning in the face of inter-annual variability, this could imply that certain approaches to selecting time steps are suitable to determine a ''mean" system design across all years. However, there is also the question of weighting the rarer extreme events. By having too high a weight, they will skew model results towards overcapacity. Which of the resulting capacities is ''true" is not a straightforward determination, and these results also do not give a clear answer to the question of how an energy model can determine a robust planning approach in the face of inter-annual variability whilst also working with a reduced set of time steps drawn from a longer time series.

Discussion and conclusion
This paper has compared different methods to reduce the time resolution of energy model input time series, and assessed their  effect on the outputs from a power system planning model for Great Britain. The results show that different methods, including downsampling, heuristic selection of time steps, and clustering, lead to substantially different model results, in particular when modeling high shares of variable renewable generation. Approaches including heuristic methods appear to be more stable when applied across different models of individual years, and may therefore be preferable for those with high shares of variable renewables. In the ''50% renewable generation" scenario (with a constraint that at least 50% of total generation over the entire modeled period must come from wind and PV), model results still showed acceptable accuracy at resolutions of 6-hourly and lower, when comparing to the reference 1-hourly model.
The ''90% renewable generation" scenario substantially increased the difference in results between different amounts of time steps and time resolution reduction methods. This underscores that modeling renewables adequately requires high resolution input data, but also that documenting the processing steps   applied to this input data in the course of any given analysis is important to understand and evaluate model results. Results pertaining to installed capacity requirements, levelized costs and the overall feasibility of certain power or energy system configurations, should be used with caution when high shares of renewables are modeled with time resolutions much below 1-hourly or with only a small number of typical days. Giving a model additional inter-time step balancing opportunities like storage further decreases the optimization's computational tractability, but it also reduces the importance of high time resolution. Overall, the results suggest that the appropriate method with which to reduce the number of time steps in a model depends on the model setup.
Rather than falling back on the same method for all model runs, studies should therefore include justification for why a specific approach was chosen and validate that method for their specific model configuration.
The situation is further complicated by the substantial inter-annual variability in wind and PV power output. This variability results in the need to use not just a single year (for example, a typical meteorological year) but as long a time series as possible, ideally several decades. The resulting modeling problem is how to deal with the increased amount of data while maintaining computational tractability. It is also a planning problem, since it involves the question of how much variable generation capacity to build, and where, to deal with this variability. Furthermore, it entails setting criteria to apply for acceptable levels of supply security and system stability. These security and stability requirements are influenced by the variability of renewable generation and the availability and cost of backup and balancing options for variable renewables. Assessing them therefore requires that model results not be distorted by inaccurate time series data. For all these reasons, there is the need for further research into how to integrate multi-decade time series of renewable generation into energy planning models, so that they in turn can help plan for secure and affordable energy systems with high shares of variable generation.
The work presented here has several limitations. The optimization model used is a simplified one and does not therefore lead directly to real-world planning insights. The resulting levelized costs, for example, for the ''90% renewable generation" and ''90% with storage" scenarios, should not be interpreted as representative of the costs of real power systems, but are used to compare the relative differences between a model run with different time step configurations. Furthermore, the relevance of spatial detail is not further examined in this work. Temporal resolution is important for the balancing of supply and demand and for depicting the variability of wind and PV. Spatial resolution allows a model to capture dynamics such as a weather system moving across a country and the impact this has on variable renewable generation in different model locations. Depending on a model's purpose -for example, examining decentralized generation -higher spatial resolution may be of crucial importance. The analyses performed here cannot quantify the effect of spatial resolution or its relative importance compared to temporal resolution, so this is another avenue for future work.
This study has assessed methods to address time resolution systematically, extending knowledge on what methods are appropriate and where, and underscoring the fact that published results based on models should always carefully specify what adjustments were made to the time series data. The methods and data are freely available to build upon. This makes it possible for other modelers to easily adapt or extend the approaches presented here, and apply them to their own models and data. The results suggest that while heuristic approaches appear promising, there is no one-size-fits-all approach to reduce time resolution while also covering long-term variability. However given the rising importance of variable renewable generation, there is both a need and ample room for more research on these problems.