The strong effect of network resolution on electricity system models with high shares of wind and solar

Energy system modellers typically choose a low spatial resolution for their models based on administrative boundaries such as countries, which eases data collection and reduces computation times. However, a low spatial resolution can lead to sub-optimal investment decisions for wind and solar generation. Ignoring power grid bottlenecks within regions tends to underestimate system costs, while combining locations with different wind and solar capacity factors in the same resource class tends to overestimate costs. We investigate these two competing effects in a capacity expansion model for Europe's power system with a high share of renewables, taking advantage of newly-available high-resolution datasets as well as computational advances. We vary the number of nodes, interpolating between a 37-node model based on country and synchronous zone boundaries, and a 1024-node model based on the location of electricity substations. If we focus on the effect of renewable resource resolution and ignore network restrictions, we find that a higher resolution allows the optimal solution to concentrate wind and solar capacity at sites with better capacity factors and thus reduces system costs by up to 10% compared to a low resolution model. This results in a big swing from offshore to onshore wind investment. However, if we introduce grid bottlenecks by raising the network resolution, costs increase by up to 23% as generation has to be sourced more locally at sites with worse capacity factors. These effects are most pronounced in scenarios where grid expansion is limited, for example, by low local acceptance. We show that allowing grid expansion mitigates some of the effects of the low grid resolution, and lowers overall costs by around 16%.


Introduction
Electricity systems with high shares of wind and solar photovoltaic generation require a fundamentally different kind of modelling to conventional power systems with only dispatchable generation [63]. While investments in conventional power plants can be dimensioned according to simple heuristics like screening curves [10], the assessment of wind and solar resources requires a high temporal and spatial resolution to capture their weather-driven variability. The need to assess investments in generation, transmission and flexibility options over thousands of representative weather and demand situations, as well as over thousands of potential locations, means that balancing model accuracy against computational resources has become a critical challenge.
The effects of temporal resolution have been well researched in the electricity system planning literature [12], including the need for at least hourly modelling resolution [63], the consequences of clustering representative conditions [41], and the need to include extreme weather events [50]. On the spatial side, it has been recognized that integrating renewable resources on a continental scale can smooth large-scale weather variations, particularly from wind [23], and avoid the need for temporal balancing. This smoothing effect has been found in studies of the benefits of grid expansion both in Europe, where the impact on balancing needs [53] and storage requirements [55] has been analysed, ⋆ This document is the results of the research project funded by Helmholtz Association under grant no. VH-NG-1352. ORCID(s): 1 Corresponding author martha.frysztacki@kit.edu (Martha Maria) and in the United States [44]. However, there has been little research on the effects of spatial resolutions on planning results. This is partly due to the fact that collecting highresolution spatial data is challenging, as well as the fact that optimization at high-resolution over large areas is computationally demanding.
Choosing the spatial resolution based on administrative boundaries such as country borders -which is a common approach in the literature [23,53,31]-fails to account for the variation of resources inside large countries like Germany. Aggregating low-yield sites together with high-yield sites takes away the opportunity to optimize generation placement, which distorts investment decisions and drives up costs.
On the other hand, aggregating diverse resources to single points tends to underestimate network-related costs, since the models are blind to network bottlenecks that might hinder the welfare-enhancing integration of renewable resources located far from demand centers. The effects of network restrictions are all the more important given the apparent low public acceptance for new overhead transmission lines, observed in Germany [30] and across Europe [20], and the long planning and construction times for new grid infrastructure [26].
In the present contribution we introduce a novel methodology to disentangle these two competing spatial effects of resource and network resolution, so that for the first time their different impacts on system costs and technology choices can be quantified. We then demonstrate the methodology by running simulations in a model of the future European electricity system with a higher spatial resolution than has previously been achieved in the literature. We optimize investments and operation of generation, storage and transmission jointly in a system with a high share of renewables under a 95% reduction in CO 2 emissions compared to 1990, which is consistent with European targets for 2050 [25]. A recently-developed, high-resolution, open-source model of the European transmission network, PyPSA-Eur [37], is sequentially clustered from 1024 nodes down to 37 nodes in order to examine the effects on optimal investments in generation, transmission and storage.
Previous work in the engineering literature has focused on the effect of different network clustering algorithms [40] on the flows in single power flow simulations [11,33], or used clustering algorithms that are dependent on specific dispatch situations [18,62,59] and therefore unsuitable when making large changes to generation and transmission capacities. In the planning literature that considers a high share of renewables in the future energy system, the effects of clustering applied separately to wind, solar and demand were investigated in [61], but neglected potential transmission line congestion within large regions. In [43] the previous study was extended by including a synthesized grid and renewable profiles, but it ignored the existing topology of the transmission grid. Effects of varying the resolution were not considered in either of the studies. Recent work has examined regional solutions for the European power system, but did not take into account existing transmission lines, potential low public acceptance for grid reinforcement or the grid flow physics [67]. Other studies have examined transmission grid expansion at substation resolution, but either the temporal resolution was too low to account for wind and solar variability [24,35], or only single countries were considered [46,1,35], or transmission expansion was not co-optimized with generation and storage [24,15,58]. The competing effect of clustering transmission lines versus variable resource sites on the share of renewables was also discussed in [21], but the report did not provide an analysis of how strongly the respective clustering impacts modeling and planning results. The effects of model resolution on system planning results were considered for the United States in [42], where a costbenefit was seen for higher wind and solar resolution, but the resource resolution was not separated from the network resolution, and only a small number of time slices were considered to represent weather variations.
Advances in solver algorithms and code optimization in the modelling framework PyPSA [13], as well as hardware improvements, allow us to achieve what was previously not possible in the literature: the co-optimization of transmission, generation and storage at high temporal and spatial resolution across the whole of Europe, while taking into account linearized grid physics, existing transmission lines and realistic restrictions on grid reinforcement. In previous work by some of the authors large effects of spatial resolution on investment results were seen [36], but because the resource and network resolution were changed in tandem, it was not possible to analyse which effect dominates the results. In the present contribution we present a novel study design that separates the effects of resource and network resolution, and demonstrate the substantial differences between the two effects using the high-resolution simulations enabled by recent software and hardware advances.

Methods
In this section we present an overview of the underlying model and the study design, before providing more details on the clustering methodology and the investment optimisation. A list of notation is provided in Table 2.

Model input data
The study is performed in a model of the European electricity system at the transmission level, PyPSA-Eur, which is fully described in a separate publication [37]. Here we give a brief outline of the input data.
The PyPSA-Eur model shown in Figure 1 contains all existing high-voltage alternating current (HVAC) and direct current (HVDC) lines in the European system, as well as those planned by the European Network of Transmission System Operators for Electricity (ENTSO-E) in the Ten Year Network Development Plan (TYNDP) [26]. The network topology and electrical parameters are derived from the ENTSO-E interactive map [3] using a power grid extraction toolkit [69]. In total the network consists of 4973 nodes, 5721 HVAC and 32 HVDC lines existing as of 2018, as well as 279 HVAC and 29 HVDC planned lines.
Historical hourly load data for each country are taken from the Open Power System Data project [5] and distributed to the nodes within each country according to population and gross domestic product data. Generation time series are provided for the surrounding wind and solar plants based on his- Clustering on siting resolution Fix the transmission network to one-node-per-country-zone = 37 and increase the number of generation and storage sites ∈  3 Clustering on network nodes Maintain a high resolution of generation sites = 1024 and successively increase the number of transmission nodes ∈  torical wind and insolation data derived from the ERA5 reanalysis dataset [4] and the SARAH2 surface radiation dataset [51]. Renewable installation potentials are based on land cover maps, excluding for example nature reserves, cities or streets.
The model was partially validated in [37]. Further validation against historical data was carried out in [28], where it was shown that the model could reproduce curtailment of wind and solar in Germany due to transmission bottlenecks in the years 2013-2018. The ability to reproduce historical congestion provides a strong check on the match between the transmission network data and the availability of wind and solar generation in the model.

Clustering study design
The nodes of the model are successively clustered in space into a smaller number of representative nodes using themeans algorithm [34]. This groups close-by nodes together, so that, for example, multiple nodes representing a single city are merged into one node. Nodes from different countries or different synchronous zones are not allowed to be merged; to achieve this, the overall number of desired nodes is partitioned between the countries and synchronous zones before the -means algorithm is applied in each partition separately. In total there are 37 'country-zones' in the model, i.e. regions of countries belonging to separate synchronous zones. Figure 2, Case 1 shows the results for Ireland and the United Kingdom (where Northern Ireland is in a separate synchronous zone to Great Britain). Once the nodes have been clustered, they are reconnected with transmission corridors representing the major transmission lines from the highresolution model. Electricity demand, conventional generation and storage options are also aggregated to the nearest network node. More technical details on the clustering can be found in subsection 2.5. An analysis of the effects of clustering on the network flows can be found in the Appendix, Section A.1.

Resource versus network resolution case studies
To separate the effects of the spatial resolution on the renewable resources and the network, we consider three cases in which they are clustered differently. The three cases are summarized in Table 1 and shown graphically in Figure 2 for each case (rows) and for each level of clustering (columns).
In Case 1 the wind and solar sites are clustered to the same resolution as the network. The number of clusters is varied between 37, the number of country-zones, and 1024, which represents the maximum resolution for which generation, transmission and storage investment can be co-optimized in reasonable time. The number of nodes is increased in halfpowers of 2, so that nine different resolutions are considered: 20 . In Case 2 network bottlenecks inside each country-zone are removed so that there are only 37 transmission nodes, and only the resolution of the wind and solar generation is varied. Inside each country-zone, all wind and solar generators are connected to the central node. This allows the optimization to exploit the best wind and solar sites available.
Finally in Case 3 we fix a high resolution of renewable sites and vary the number of network nodes, in order to explore the effects of network bottlenecks. Each renewable site is connected to the nearest network node, where the transmission lines, electricity demand, conventional generators and storage are also connected.
For each case we optimize investments and operation for wind and solar power, as well as open cycle gas turbines, batteries, hydrogen storage and transmission. Flexibility from existing hydroelectric power plants is also taken into account. The model is run with perfect foresight at a 3hourly temporal resolution over a historical year of load and weather data from 2013, assuming a 95% reduction in CO 2 emissions compared to 1990. The temporal resolution is 3hourly to capture changes in solar generation and electricity demand while allowing reasonable computation times. The technology selection is also limited for computational reasons. More details on the investment optimization can be found in subsection 2.6.
For each simulation we also vary the amount of new transmission that can be built, in order to understand the effect of possible grid reinforcements on the results. The model is allowed to optimize new transmission reinforcements to the grid as it was in 2018, up to a limit on the sum over new capacity multiplied by line length measured relative to the grid capacity in 2018. For example, a transmission expansion of 25% means that on top of 2018's grid, new lines corresponding to a quarter of 2018's grid can be added to the network. The exact constraint is given in equation (17) in subsection 2.6.

Network preparation
Before the clustering algorithm can be applied to the network, several simplifications are applied to the data.
In order to avoid the difficulty of keeping track of different voltage levels as the network is clustered, all lines are mapped to their electrical equivalents at 380 kV, the most prevalent voltage in the European transmission system. If the original reactance of the line , was , at its original voltage , , the new equivalent reactance becomes (1) This guarantees that the per unit reactance is preserved after the equivalencing. The impedances and thermal ratings of all transformers are neglected, since they are small and cannot be consistently included with the mapping of all voltage levels to 380 kV.
Univalent nodes, also known as dead-ends, are removed sequentially until no univalent nodes exist. That is, if node has no other neighbor than node , then node is merged to node . We repeat the process until each node is multi-valent and update the merged node attributes and its attached assets (loads, generators and storage units) according to the rules in Table 5. HVDC lines in series or parallel are simplified to a single line using the rules in Tables 6 and 7. Capital costs per MW of capacity for HVDC lines , with length , and a fraction , ∈ [0, 1] underwater are given by where marine is the capital cost for a submarine connection and ground for an underground connection. The factor of 1.25 accounts for indirect routing and height fluctuations.

Clustering methodology
Different methods have been used to cluster networks in the literature. We chose a version of -means clustering [34] based on the geographical location of the original substations in the network, weighted by the average load and conventional capacity at the substations, since this represents how the topology of the network was historically planned to connect major generators to major loads. It leaves the long transmission lines between regions, which are expensive to upgrade and are more likely to encounter low local acceptance, unaggregated, so that these lines can be optimized in the model. Regions with a high density of nodes, for example around cities, are aggregated together, since the short lines between these nodes are inexpensive to upgrade and rarely present bottlenecks. Geographical -means clustering has the advantage over other clustering methods of not making any assumptions about the future generation, storage and network capacity expansion.
Other clustering methods applied in the literature are not suitable for the co-optimization of supply and grid technologies: these include clustering based on electrical distance using -medoids [11,22], a modified version of -medoids to avoid assigning both end nodes of a critical branch to the same zone [2], hierarchical clustering [9], or -decomposition and eigenvector partitioning [66] (which we do not use because we want to optimize new grid reinforcements that alter electrical distances), spectral partitioning of the graph Laplacian matrix [33] (avoided for same reason), an adaptation of -means called -means++ combined with a maxregions algorithm applied to aggregate contiguous sites with similar wind, solar and electricity demand [61] (avoided since we want a coherent clustering of all network nodes and assets), hierarchical clustering based on a database of electricity demand, conventional generation and renewable profiles including a synthesized grid [43] (avoided for the same reason and because we do not want to alter the topology of the existing transmission grid), -means clustering based on renewable resources as well as economic, sociodemographic and geographical features [19] (avoided because we need a clustering focused on network reduction), as well as clustering based on zonal Power Transfer Distribution Factors (PTDFs) to detect congestion zones [18], to yield the same flow patterns as the original network [49] or to analyse policy options and emissions [60] (avoided because they encode electrical parameters that change with reinforcement), Available Tranfer Capacities (ATCs) [59] (avoided because they depend on pre-defined dispatch patterns) and locational marginal prices (LMP) [62] (again avoided because they depend on pre-defined dispatch patterns).
We do not allow nodes in different countries or different synchronous zones to be clustered together, so that we can still obtain country-specific results and so that all HVDC between synchronous zones are preserved during the aggregation. This results in a minimum number of 37 clustered nodes for the country-zones. First we partition the desired total number of clusters between the 37 country-zones, then we apply the -means clustering algorithm within each country-zone.
In order to partition the nodes between the 37 country-zones, the following minimisation problem is solved where is the total load in each country-zone . An additional constraint ensures that the number of clusters per country-zone matches the desired number of clusters for the whole network: ∑ = . Then the -means algorithm is applied to partition the nodes inside each country-zone into clusters. The algorithm finds the partition that minimizes the sum of squared distances from the mean position of each cluster ∈ ℝ 2 to the positions ∈ ℝ 2 of its members ∈ Each node is additionally assigned a normalised weighting based on its nominal power for conventional generators and averaged load demand: where , corresponds to the averaged demand over the con- The optimization is run with init = 10 3 different centroid seeds, a maximum number of iterations for a single run of max iter = 3 ⋅ 10 4 and a relative tolerance with regards to inertia to declare convergence of = 10 −6 .
Attributes of the nodes in and their attached assets are aggregated to the clustered node according to the rules in Table 5.
Lines connecting nodes in cluster with nodes in cluster , given by the set , are aggregated to a single representative line. The length of the representative line is determined using the haversine formula (which computes the great-circle distance between two points on a sphere) multiplied by a factor of 1.25 to take indirect routing into account. The representative line inherits the attributes of the lines , as described in Table 7. If any of the replaced lines in , had the attribute that their capacity was extendable, then the aggregated line inherits this extendability.
An analysis of the effects of clustering on the network flows can be found in the Appendix, Section A.1.
For Case 1, generators are clustered to the same resolution as the network. Times series containing hourly resolved capacity factors̄ , , ∈ [0, 1] for variable renewable generation are aggregated using a weighted averagē The resulting capacity factor̄ , , is in [0, 1] by definition. For renewables, the weighting , is proportional to the maximal yearly yield for technology at node , found by multiplying the maximal installable capacity max , with the average capacity factor. In the case of conventional technologies the weightings are distributed equally, i.e , = 1. Note that there is no relation between the weightings , and the bus weightings of (4). For Case 2, the network is fixed at 37 nodes, and the wind and solar generators are merged in the aggregation step. Time series for VRE availability are aggregated according to (6) to their respective resolution.
For Case 3, the network is clustered, but wind and solar generators are not merged in the aggregation step. Their time series remain fixed at high resolution of 1024 nodes.

Investment optimisation
Investments in generation, storage and transmission are optimized in the PyPSA modelling framework [13], which minimises the total system costs. The objective function is consisting of the annualised fixed costs , for capacities , at each node and storage/generation technology , the dispatch , , of the unit at time and associated variable costs , multiplied by a weight factor corresponding to the temporal resolution of the system, and the line capacities for each line including both high voltage alternating current and direct current lines and their annualised fixed costs . The time period runs over a full year at a 3-hourly resolution, so each time period is weighted with = 3. Investment cost assumptions are provided in Table 3, based on projections for the year 2030. Assumptions are based on [7] for wind technologies, [57] in case of OCGT, PHS, hydro, run-of-river, [16] for storage technologies and [68] for solar technologies. 2030 is chosen for the cost projections since this is the earliest possible time that such a system transformation might be feasible, and because it results in conservative cost assumptions compared to projections for a later date. The only CO 2 -emitting generators are the open cycle gas turbines with natural gas with specific emissions 0.187 tCO 2 /MWh th and fuel cost 21.6 €/MWh th . Investment costs are annualized with a discount rate of 7%. Lifetimes, efficiencies and operation and maintenance costs can be found in the GitHub repository [8].
The dispatch of conventional generators , , is constrained by their capacity , The maximum producible power of renewable generators depends on the weather conditions, which is expressed as an availabilitȳ , , per unit of its capacity: The installable renewable capacity , is constrained by land eligibility for placing e.g. wind turbines or solar panels in each node and for each renewable technology. The land restrictions are derived using the Geospatial Land Availability for Energy Systems (GLAES) tool [54] and are always finite for renewable carriers: There is no capacity constraint for conventional generators: The energy levels , , of all storage units have to be consistent between all hours and are limited by the storage energy capacity , , , Positive and negative parts of a value are denoted as [⋅] + = max(⋅, 0), [⋅] − = − min(⋅, 0). The storage units can have a standing loss 0 , a charging efficiency 1 , a discharging efficiency 2 , inflow (e.g. river inflow in a reservoir) and spillage. The energy level is assumed to be cyclic, i.e. , , =0 = , , = . CO 2 emissions are limited by a cap CAP 2 , implemented using the specific emissions in CO 2 -tonne-per-MWh of the fuel and the efficiency , of the generator: In all simulations this cap was set at a reduction of 95% of the electricity sector emissions from 1990.
The (perfectly inelastic) electricity demand , at each node must be met at each time by either local generators and storage or by the flow , from a transmission line where , is the incidence matrix of the network. This equation is Kirchhoff's Current Law (KCL) expressed in terms of the active power.
In the present paper the linear load flow is used, which has been shown to be a good approximation for a well-compensated transmission network [65], including for simulations using a large-scale European transmission model [15]. To guarantee the physicality of the network flows, in addition to KCL, Kirchhoff's Voltage Law (KVL) must be enforced in each connected network. KVL states that the voltage differences around any closed cycle in the network must sum to zero. If each independent cycle is expressed as a directed combination of lines by a matrix , then KVL becomes the constraint where is the series inductive reactance of line . It was found in [39] that expressing the linear load flow equations in this way with cycle constraints is computationally more efficient than angle-or PTDF-based formulations. Note that point-to-point HVDC lines have no cycles, so there is no constraint on their flow beyond KCL.
The flows are also constrained by the line capacities Although the capacities are subject to optimisation, no new grid topologies are considered beyond those planned in the TYNDP 2018 [26]. The factor = 0.7 leaves a buffer of 30% of the line capacities to account for − 1 line outages and reactive power flows. The choice of 70% for is standard in the grid modelling literature [64,17,29,15] and is also the target fraction of cross-border capacity that should be available for cross-border trading in the European Union (EU) by 2025, as set in the 2019 EU Electricity Market Regulation [6].
Since line capacities can be continuously expanded to represent the addition of new circuits, the impedances of the lines would also decrease. In principle this would introduce a bilinear coupling in equation (14) between the and the , . To keep the optimisation problem linear and therefore computationally fast, is left fixed in each optimisation problem, updated and then the optimisation problem is run, in up to 4 iterations to ensure convergence, following the methodology of [32,47].
In order to investigate the effects of transmission expansion, each line capacity can be extended beyond the capacity in 2018, ≥ 2018 , up to a a line volume cap CAP trans , which is then varied in different simulations: The caps are defined in relation to 2018's line capacities 2018 , i.e.
where is varied between zero and 50%.
Since there is a cap on the transmission expansion, the line costs can be set to zero. For the results, costs are added after the simulation based on the assumptions in Table  3.

Model output data
The optimised model returns the spatially-resolved capacity for each technology , as well as the amount of transmission expansion of each included line . Additionally, the results also provide dispatch time series for each of the generators , , and electricity flows , for included lines that obey the constraints described above in subsection 2.6. Figure 3 presents the total annual system costs for each case. To obtain a better understanding of the system composition, Figure 4 breaks down the total costs into individual components when there is no grid expansion. In Figure  5 we present total system costs for different grid expansion scenarios for 256 clusters in the simultaneous case (Case 1). An example map of investments can be found in Figure 6 for a 25% grid expansion (a similar level to ENTSO-E's TYNDP [26]).

Case 1 -Increasing number of both generation sites and transmission nodes
If the resource and network resolutions increase in tandem according to Case 1 without grid expansion, the total annual system costs in Figure 3 rise gently with the increasing number of nodes, reaching a maximum of 273 billion euros per year at 1024 nodes, which is 10% more expensive than the solution with 37 nodes. This corresponds to an average system cost of 87 €/MWh. If some transmission expansion is allowed, costs are lower, and there is almost no change in total system costs as the number of nodes is varied.
However, the fact that costs are flat does not mean that the solutions are similar: a large shift from offshore wind at low resolution to onshore wind at high resolution can be observed in the left graph of Figure 4 (Case 1). This is an indication that spatial resolution can have a very strong effect on energy modelling results. To understand what causes this effect, we must examine Cases 2 and 3.

Case 2 -Importance of wind and solar resource granularity
In Case 2 we use the lowest network resolution of 37 nodes, corresponding to one-node-per-country-zone, and investigate the effect of changing the number of wind and solar sites on the results. As the resolution increases, total costs without grid expansion in Figure 3 drop by 10% from 248 to 222 billion euro per year. Although the slope of the cost curve appears constant, note that the -axis is logarithmic, so that the rate of cost decrease slows as the number of sites increases.
The cost reduction is driven by strong changes in the investment between generation technologies, particularly the ratio between offshore and onshore wind (see Figure 4). At low spatial resolution, good and bad onshore sites are mixed together, diluting onshore capacity factors and making onshore a less attractive investment. Figure 9 in the Appendix shows how the capacity factors for wind and solar vary across the continent. While offshore is spatially concentrated and solar capacity factors are relatively evenly spread in each country-zone, onshore wind is stronger near coastlines. At high spatial resolution the model can choose to put onshore wind only at the best sites (within land restrictions), increasing average capacity factors and thus lower the per-MWhcost. (The increasing average capacity factors are plotted in Figure 11 in the Appendix.) As a result, onshore wind investments more than double from 24 to 54 billion euros per   year, while offshore investments drop 37% from 100 to 64 billion per year and solar by 23%. The biggest effect on the technology mix is when going from 37 to around 181 clusters; beyond that the changes are smaller.

Case 3 -Impact of transmission bottlenecks
In Case 3 we fix a high resolution of wind and solar generators (1024 sites) and vary the resolution of the transmission network to gauge the impact of transmission bottlenecks. With 37 network nodes many bottlenecks are not visible, so costs are lower, but as the resolution increases to 1024 nodes it drives up the costs by 23%. Note that because the -axis is logarithmic, the highest rate of cost increase is when the number of nodes is small.
As can be seen from the breakdown in Figure 4, the rising transmission investments from the higher resolution only have a small contribution to the result. Instead, rising costs are driven by generation and storage. Unlike Cases 1 and 2, the ratio between the generation technologies does not change dramatically with the number of clusters, but the capacities for onshore wind, solar, batteries and hydrogen storage all rise.
The transmission bottlenecks limit the transfer of power from the best sites to the load, forcing the model to build onshore wind and solar more locally at sites with lower capacity factors. Average capacity factors of onshore wind and solar sink by 11% and 6% respectively with no grid expansion (see Figure 11 in the Appendix), meaning that more capacity is needed for the same energy yield. Curtailment is generally low in the optimal solution (around 3% of avail- able wind and solar energy) and has less of an effect on costs (see Figure 12 in the Appendix).
Investment in battery and hydrogen storage rises with the number of network nodes since the storage is used to balance local wind and solar variations in order to avoid overloading the grid bottlenecks.

Comparison of the three cases
Separating the effects of resource resolution from network resolution reveals that the apparent stability of total system costs in Case 1 in Figure 3 as the number of clusters changes, as reported in [36], is deceptive. In fact, the sinking costs from the higher resource resolution are counter-acted by the rising costs from network bottlenecks. With no grid expansion, the system cost of network bottlenecks is double the benefit of the higher resource resolution.
While these two effects offset each other at the level of total system costs, they have very different effects on the technology mix. Resource resolution leads to much stronger investment in onshore wind, once good sites are revealed. Network bottlenecks have only a weak effect on the ratio of generation technologies, but lead to lower average capacity factors and drive up storage requirements.

Benefits of grid expansion
Grid expansion does not affect the main qualitative features of the different Cases, but it does have the overall effect of lowering total system costs. In Case 1, the total costbenefit of grid expansion is highest at around 16% for a 50% increase in grid capacity, with the marginal benefit still increasing, but it is subject to diminishing returns (see Appendix Figure 14 for a comparison of the marginal benefit to the cost of transmission). The first 9% of additional grid capacity brings total cost savings of up to 8%, but for each extra increment of grid expansion, the benefit is weaker. There is more benefit from grid expansion at a higher number of nodes, since the higher network resolution reveals more critical bottlenecks in the transmission system.
The total savings from 25% and 50% grid expansion are around 36 and 44 billion euros per year respectively. In a 2018 study ENTSO-E examined scenarios with up to 75% renewable electricity in Europe in 2040 with and without planned TYNDP grid expansions (corresponding to around 25% grid expansion), given fixed demand and a fixed generation fleet. They found that the grid reinforcements reduce generation costs by 43 billion euros per year. This is higher than our cost-benefit for 25% grid expansion, despite their study's lower level of renewable electricity, because in our simulations the generation and storage fleet can be reoptimised to accommodate the lower level of grid capacity, and because we subtract the costs of new grid reinforcement from the cost-benefit (a contribution of around 3.5 billion euros per year). The breakdown of system cost as the grid is expanded for a fixed number of clusters (256), plotted in Figure 5, reveals how costs are reduced. Although the investment in transmission lines rises, generation and storage costs reduce faster as investment shifts from solar and onshore wind to offshore wind. Offshore wind reduces costs because of its high capacity factors and more regular generation pattern in time. It can be transported around the continent more easily with more transmission, and benefits from the smoothing effects over a large, continental area that grid expansion enables. The map of investments in Figure 6 shows how offshore wind is balanced by new transmission around the North Sea, which smooths out weather systems that roll across the continent from the Atlantic. Further transmission reinforcements bring energy inland from the coastlines to load centers. With more transmission, there is less investment in battery and hydrogen storage, as a result of the better balancing of weather-driven variability in space.
Turning to Case 3, we see that grid expansion mitigates the effect of network resolution by allowing bottlenecks to be alleviated. For a 50% increase in transmission capacity, total costs rise by only 4% from 90 nodes up to 1024 nodes. The distribution of investments between technologies also barely changes in this range (see Appendix Figure 10). This means that a grid resolution of around 90 nodes can give acceptable solutions for grid expansion scenarios if computational resources are limited, as long as the wind and solar resolution is high enough (as in Case 2, 181 generation sites would suffice). Without grid expansion, a higher grid resolution is needed to capture the effects of bottlenecks and achieve reliable results.

Computation times and memory
Besides the poor availability of data at high resolution, one of the main motivations for clustering the network is to reduce the number of variables and thus the computation time of the optimisation. In Appendix Figure 15 the memory and solving time requirements for each Case are displayed as a function of the number of clusters. Both mem-ory and solving time become limiting factors in Cases 1 and 3, with random access memory (RAM) usage peaking at around 115 GB and solving time at around 6 days for 1024 clusters. Beyond this number of clusters no consistent convergence in the solutions was seen.
Case 2, where the network resolution is left low and the resource resolution is increased, shows seven times lower memory consumption and up to thirteen times faster solving times compared to Cases 1 and 3 for the same number of clusters. It is therefore the network resolution rather than the resource resolution that drives up computational requirements, which it does by introducing many new variables and possible spatial trade-offs into the optimisation. Since Case 2 proved relatively reliable for estimating the ratio between technologies, if not their total capacity, it may prove attractive to increase the resource resolution rather than the network resolution if computational resources are limited.

Further results
Further results on curtailment, average capacity factors, the distribution of technologies between countries, maps, network flows and shadow prices can be found in the Appendix, as well as a discussion of the limitations of the model.

Conclusion
From these investigations we can draw several conclusions. Modellers need to take account of spatial resolution, since it can have a strong effect on modelling results. In our co-optimization of generation, storage and network capacities, higher network resolution can drive up total system costs by as much as 23%. Higher costs are driven by the network bottlenecks revealed at higher resolution that limit access to wind and solar sites with high capacity factors. On the other hand, resource resolution affects the balance of technologies by revealing more advantageous onshore wind sites. In both cases the system costs are driven more by the useable generation resources than investments in the grid or storage.
If grid expansion can be assumed, a grid resolution of 90 nodes for Europe is sufficient to capture costs and technology investments as long as the solar and onshore wind resolution is at least around 181 nodes. If grid expansion is not possible, a higher spatial resolution for the grid is required for reliable results on technology choices. Since grid expansion is likely to be limited in the future by low public acceptance, more attention will have to be paid to the computational challenge of optimizing investments at high spatial granularity.

Lead contact
Please contact the Lead Contact, Martha M. Frysztacki (martha.frysztacki@kit.edu), for information related to the data and code described in the following Material and Methods section. Pearson's r Figure 7: Pearson's correlation coefficient of mapped flows (blue). Note that the x-axis is non-linear, therefore we mark a linear fit to the data (red).

Materials Availability
No materials were used in this study.

Data and Code Availability
All the code and input data from PyPSA-Eur are openly available online on GitHub and Zenodo [8,38]. All model output data is available on Zenodo under a Creative Commons Attribution Licence [27].

Glossary
All notation is listed in Table 2.

Acknowledgements
We thank Martin Greiner, Fabian Neumann, Lina Reichenberg, Mirko Schäfer, David Schlachtberger, Kais Siala and Lisa Zeyen for helpful discussions, suggestions and comments. MF, JH and TB acknowledge funding from the Helmholtz Association under grant no. VH-NG-1352. The responsibility for the contents lies with the authors.

Declaration of Interests
The authors declare that they have no competing financial interests.

A.1. Preservation of flow patterns with clustering
To understand how well the -means clustering preserves flow patterns, we took a fixed dispatch pattern for the assets in Europe at high resolution and examined how the network flows changed as the network was clustered.
The fixed dispatch was determined by solving the linearised optimal power flow problem for a 1024-node representation of today's European electricity system. The asset dispatch was then mapped into the clustered networks, and If lines ∈ , in the 1024-node network were mapped to a single representative line , in the clustered network, the summed flows from the original network̂ , , = ∑ ∈ , , ('microscopic flows') were then compared to the flow , , in line , of the clustered network ('macroscopic flows'). Figure 7 shows the Pearson correlation coefficient between the flows , , of aggregated lines , in the lower resolution network and the summed flowŝ , , of all lines in , in the full resolution network. Red is a linear fit through the points. The distortion from linearity is due to a non-linear scale in the -axis. Even at 37 nodes the correlation between the flows is good (Pearson correlation coefficient above 0.90) and shows an improving trend until at full 1024-node resolution the flows are once again perfectly equal.
Example density plots of thê , , against the , , for all lines and all times are plotted for different clustering levels in Figure 8. The match between the flows is better for higher resolution networks, with a near-diagonal line already for 362 nodes.
For a more probabilistic approach, we perform a kernel density estimation (KDE) by applying a fast Fourier transformation of aggregated flows of the higher resolved network versus the flows of the low resolution network. Aggregated flowŝ , , are considered an estimator for the flow , , in the representative lower resolution network. The resulting density functions from the KDE are displayed in Figure  8. For the low resolution network, the probability distribution has two different modes, while a higher resolution network approaches a Gaussian distribution. The variance of the probability density function for a low resolution network is higher than for a high resolution network, as each of the quantile isolines are broader. The graphics show that capacity factors for solar are decreasing from South to North while those for wind are increasing towards the North and Baltic Sea. The average capacity factors are spatially correlated, but as they are aggregated over larger and larger areas using the weighted average from the clustering approach in equation (6), they decline as bad sites are mixed with good sites. This is reflected in Figure 11, which shows how the average capacity factors per technology for the generation fleet optimized over the whole of Europe change with the clustering. Figure 10 shows an extension of the cost breakdowns in Figure 4 from the scenario with no transmission to scenarios with 25% and 50% grid expansion. The general trends are the same as for the scenario without grid expansion, but grid expansion generally allows more wind capacity to be built, resulting in lower investment in solar, batteries and hydrogen storage, as was seen in Figure 5.

A.4. Average capacity factors per technology
To understand how the model exploits the best available resource sites per node, we examine a time-averaged technology-specific capacity factor̄ . The capacity factor is weighted by how much capacity , of technology was built at each node with time-averaged capacity factor̄ , = ⟨̄ , , ⟩ .
We present this technology-specific capacity factor in Figure 11 for all three cases with the no-expansion transmission scenario, i.e. where = 2018 . As the number of clusters increases, Case 2 has a larger variety of sites per node to choose where capacity should be installed optimally and is not restricted by transmission constraints beyond country-zones. Therefore, the more sites are available, the higher the weighted capacity factor is because it is not mixed with lower capacity factor sites in equation (6). The highest resolution of Case 2 is also the lowest resolution of Case 3: many resource sites and only one node per country-zone. As the number of nodes in Case 3 increases while the same sites are available, transmission bottlenecks force the model to build more capacity in locations of worse capacity factors. Therefore, the capacity factors drop again. For Case 1, where resource resolution and network resolution change in tandem, the resource resolution dominates and we see increasing capacity factors like in Case 2.

A.5. Curtailment per technology
Curtailment is the amount of energy that is available in theory but cannot be injected into the grid because of transmission constraints or a lack of demand: , , ⋅ , − , , Figure 12 shows total curtailment per technology in all Cases. Curtailment in all situations is low (less than 4% of total demand). Curtailment increases with higher network resolution in both the Cases 1 and 3 that incorporate transmission constraints, while it is gently decreasing with resource resolution in Case 2 where there are only transmission constraints at the boundaries of country-zones. Figures 4 and 10 show the breakdown of total costs by technology for the whole of Europe. However, it could be that for each technology, the spatial distribution is unstable, moving from country to country with the clustering changes.

A.6. Breakdowns by country
For a better understanding of the spatial distribution of installed capacity, we examine the total installed renewable capacity per country in all Cases in Figure 13 with no transmission expansion. The general trend is that the total installed capacity per country is relatively stable with cluster resolution. In Case 2 capacity decreases with resolution, since the exploitation of better resource sites means that less capacity is needed for a given energy yield. The opposite effect is seen in Case 3, while Case 1 reveals a mix of the effects of Case 2 and 3.

A.7. Shadow price of line volume constraint
The shadow price trans of the transmission expansion constraint in equation (16) corresponds to the system cost benefit of an incremental MWkm of line volume. Read another way, it is the line cost required to obtain the same solution with the constraint removed (i.e. lifting the constraint into the objective function as a Lagrangian relaxation).
We present the resulting shadow prices in Figure 14, where they are compared with the annuity for underground and overhead lines. Using the cost of underground cables, the costoptimal solution would give a grid expansion of 25-50% at high resolution. For overhead transmission, the cost optimum would be over 50%.

A.8. Capacity factors within each cluster region for wind and solar
In this subsection we analyse the homogeneity of timeaverage capacity factors for wind and solar within each cluster region as the number of clusters changes. Duration curves of the capacity factors in each of the 0.3 • × 0.3 • weather pixels of the original ERA5 reanalysis dataset [4] for the European area ('cutout') are plotted in blue in Figure 16. In addition, the duration curves for the pixels in each cluster are plotted in orange, with the median for each cluster in red. This reveals how much the capacity factors of wind and solar vary within each cluster region, compared to the whole of Europe. Table 4 presents the average standard deviation with each cluster region for each technology and resolution.
For a high resolution of 1024 clusters, we observe that the median values (red dots) for solar lie very close to the representative values of Europe (black line) with a relatively small average standard deviation of 1.9 ⋅ 10 −3 inside each cluster region (scattering of the orange dots). In the case of onshore wind, the high capacity factors are underestimated by the median value, while intermediate and low capacity factors are represented with a minor difference between median and representative European value. For onshore wind, the average standard deviation of the capacity factors within each region is larger than for solar by one magnitude ((10 −2 ), represented by the scattering of orange dots). The largest variance can be observed in offshore regions, where the average standard deviation is 4.3 ⋅ 10 −2 , twice as large as for onshore regions, and the low capacity factors are overestimated by their representative median values.
In the case of 256 clusters, the standard deviation per region (scattered orange dots) doubles compared to a resolution of 1024 sites for solar and increases by ∼ 50% for onshore and offshore wind. However, the median values (red dots) per site do not change much compared to the higher resolution case. Only at very low resolutions or, in the extreme, one site representing one country-zone, the median values (red dots) do not agree with the European curve (black line), and the capacity values per site (orange scattered dots) cover a wide range of values (for example 0−0.5 for wind onshore, or 0.11 − 0.0.18 for solar). At 37 nodes, the average standard deviation is three times larger for solar compared to a resolution of 1024 sites and twice as large for onshore wind.
From this analysis we can conclude that a resource resolution of at least several hundred nodes is required to adequately capture the resource variation within Europe, with a higher resolution required for wind than for solar.

A.9. Limitations of this study
The need to solve the models at high spatial resolution and 3-hourly temporal resolution in reasonable time means that compromises have been made elsewhere: the conventional generation technologies are limited to hydroelectricity and gas turbines, the storage is limited to batteries and hydrogen storage, only a single weather year is modelled, and ancillary services, grid losses, discretisation of new grid capacities, distribution grids and forecast error are not modelled. This allows us to focus on the main interactions between wind, solar and the transmission grid; the effects of the other factors are expected to be small [12] since wind and solar investment dominates system costs. If it were cost-effective to build dispatchable low-carbon generators like nuclear or fossil generators with carbon capture and sequestration, then the effects of resource and network resolution would be dampened, since there would be less wind and solar investment.
Some of the quantitative conclusions may depend on the technology assumptions, such as the relative cost of solar PV, onshore wind and offshore wind. However, investigations of the sensitivities of similar models to generation costs [56] and of the near-optimal space of solutions [48] have shown that a large share of wind in low-cost scenarios for Europe is robust across many scenarios because of the seasonal matching of wind to demand in Europe. It is the interactions between wind and the transmission grid that drive the results in this paper.
The results may also change as additional energy sectors are coupled to the power sector, such as building heating, transport and non-electric industry demand. While extra flexibility from these sectors might offer an alternative to grid expansion, grid expansion is still expected to be costeffective [14], while the effects of resource resolution on the optimal solution remain the same.
In the present paper different market structures to today's are assumed, namely nodal pricing to manage grid congestion, and a high CO 2 price to obtain a 95% CO 2 reduction compared to 1990 levels.
We weighted the distribution of wind and solar inside each nodal region (Voronoi cell) proportional to the installable capacity and capacity factor at each weather grid cell [37]. This means good and bad sites are not mixed evenly, but skewed slightly towards good sites. This effect disappears at high resolution, where the capacity factor is more uniform inside each Voronoi cell.
Another approach would be to keep a low one-node-percountry network resolution and then have multiple resource classes defined not by region, like our Case 2, but by capacity factor [55,52,45] (e.g. a good class with sites with full load hours above 2000, a medium class between 1500 and 2000, and a bad class below 1500). This would also be beneficial but would not be compatible with the increasing grid resolution, since the generators in each class would be spread non-contiguously over the country. 5.0 ⋅ 10 −3 4.6 ⋅ 10 −2 5.9 ⋅ 10 −2 64 6.1 ⋅ 10 −3 4.9 ⋅ 10 −2 6.2 ⋅ 10 −2 45 6.1 ⋅ 10 −3 4.9 ⋅ 10 −2 6.2 ⋅ 10 −2 37 6.2 ⋅ 10 −3 4.9 ⋅ 10 −2 6.2 ⋅ 10 −2 Table 4 average standard deviation of the capacity factor (per unit) per region for a network resolution of 1024, 256 and 37 sites.  year 1 ] 0%        Figure 16: Breakdown of capacity factors per technology for the weather cutout pixels inside each cluster region as a duration curve (orange), with the median marked in red. The overall duration curve of pixel capacity factors for the whole of Europe is plotted in blue.