The eGo grid model: An open source approach towards a model of German high and extra-high voltage power grids

There are several power grid modelling approaches suitable for simulations in the field of power grid planning. The restrictive policies of grid operators, regulators and research institutes concerning their original data and models lead to an increased interest in open source approaches of grid models based on open data. By including all voltage levels between 60 kV (high voltage) and 380kV (extra high voltage), we dissolve the common distinction between transmission and distribution grid in energy system models and utilize a single, integrated model instead. An open data set for primarily Germany, which can be used for non-linear, linear and linear-optimal power flow methods, was developed. This data set consists of an electrically parameterised grid topology as well as allocated generation and demand characteristics for present and future scenarios at high spatial and temporal resolution. The usability of the grid model was demonstrated by the performance of exemplary power flow optimizations. Based on a marginal cost driven power plant dispatch, being subject to grid restrictions, congested power lines were identified. Continuous validation of the model is nescessary in order to reliably model storage and grid expansion in progressing research.


Introduction
Professional grid planning undertaken by grid operators and research institutions has been highly restricted and proprietary in past and recent times [1]. In contrast, the research project open eGo 1  Future contributions affiliated to the project open eGo will focus on the optimization of eventual investment decisions concerning storage and grid expansion. This work describes the exogenous setting for a further elaboration on these endogenous variables.
The development of a grid topology model based on OpenStreetMap (OSM) 2 is challenging especially when integrating the 110 kV grid. An approach based on the tool osmTGmod 3 is described in the next section. Several assumptions on electrical parameters have to be defined to allow power flow simulations (Section 3). Decoupled from the electric grid, the overall constitution of generation and demand is characterized for a status quo scenario and two future scenarios (Section 4). In Section 5, it is described how generators and loads are spatially allocated and connected to the grid topology. Subsequently, the definition of the temporal resolution is outlined (Section 6). Consequently, the methodologies 4 lead to a grid data model 5 which was exemplarily applied within a Linear Optimal Power Flow (LOPF) method deriving results for power plant dispatch optimization subject to grid restrictions.

A grid topology model based on osmTGmod
The generation of a grid topology can be considered as a necessary first step in creating a complete grid model ready for power flow simulations. Due to restrictive data policies, German grid operators only publish very basic information and data on their grid models. While the TSOs publish static grid models of their respective EHV areas, on an HV level the DSOs only publish maps indicating the geographical position of assets, but no information on the technical parameters. Apart from these publications by grid operators, there is the ENTSO-E map compromising the EHV infrastructure in Europe. While all this information could theoretically be made usable for power flow simulations with some effort, the most important factor against 1234567890 ''"" . These are typically based on crowd-sourced OpenStreetMap data [1]. All OSM-based models use deterministic or heuristic methods to abstract relevant power topology data sets. Electrical properties have to be calculated and assigned via assumptions to complete the geographical information gathered from raw OSM data (see also Section 3). The OSM power data for the EHV level covers approximately 95 % of the German transmission grid as far as the total length of the lines is concerned. This has proven to be a sufficient topological coverage [1] and has been applied in several academic works [3,9].
The OSM objects relevant for the topological modelling of a power grid are ways and relations. Ways correspond to physical power lines mapped in OSM and consist of a list of the nodes which are connected and attributes which can be observed such as the number of cables and wires. Relations can contain a number of ways and additional attributes. They are used in OSM to depict electrical power circuits, which might not be obvious from visual inspection of a line, but require a much more in-depth analysis. Relations provide very accurate information relevant for power flow modelling if they are correctly mapped. However, the OSM coverage of relations decreases below 220 kV . Since, in contrast to most other power flow approaches, our model is not limited to the EHV level, but aims to represent all voltage levels (see Figure 1, OSM ways have to be considered for depicting the HV level. In our definition, the HV level consists of the voltage levels 110 kV and 60 kV . For legibility and the fact that there are less than 30 60 kV lines in Germany, we use the terms HV level and 110 kV synonymously in the following. The data on ways in the HV-level seems to be sufficiently representative. Thus, the part of the grid model presented in this contribution intends to cover the exact topology of the voltage levels 110 kV and above. At the same time, the low and medium voltage (MV) levels are adressed in [8], which requires a different approach due to the lower coverage of the downstream medium and low voltage levels, which is mainly caused by the higher share of underground cables not mapped in OSM (see Table 1).  Integrated in  level  relations ways grid models  380 kV  good  good  yes  220 kV  good  good  yes  110 kV  poor  good  typically not  60 kV  poor  good typically not ≤ 35 kV none poor no The open source grid model SciGRID only uses the OSM data type relation in order to abstract power circuits [1]. Such an approach can be applied when modeling the EHV grid, due to the weaker mapping quality (see Table 1) it is, however, by far too incomplete to be applied to lower voltage levels. For instance, applying the SciGRID model to the 110 kV voltage level resulted in a total power circuit length of 13, 256 km, whereas the number provided by the Bundesnetzagentur accounts for 96, 658 km [11]. Therefore, our grid model was based on the more heuristic abstraction tool osmTGmod.As a key feature, osmTGmod uses both OSM relation and OSM way for the construction of a grid model: while the information gathered from relations is considered to be very accurate, it is used in the first place. Afterwards, information that is not covered by relations is added throught the consideration of ways. Due to the lower mapping quality and the existence of 110 kV underground cables in urban areas, there are substations or subgrids initially not connected to the rest of the grid. In our model, transition points are defined as substations, which transfer power flows between HV and MV level (see Figure 1). Since load and generation for power flow simulations are allocated to these transition points, their connection to the main grid as well as the connection of subgrids was essential. Finally, by automatically assigning transformers and electrical properties (see Section 3), the resulting grid model of osmTGmod is ready for power flow simulations without the necessity of further major adaptions [1].
We extended the grid model by an abstraction of its electrical neighbours considering the German electricity grid being embedded into the European transmission grid. Therefore, we placed buses and transformers for different voltage levels at the center of each of the relevant countries. All existing border crossing lines were then connected to these newly introduced buses with respect to their voltage level. The electrical properties of these power lines were determined based on information extracted from the ENTSO-E transmission system map [12] and standard overhead line parameters defined in Section 3.
Applying an OSM data set from October 1st, 2016, the abstraction with osmTGmod leads to a topology model with 11,294 buses and 19,605 branches. Detailed numbers per parameter can be found in Table 2.  The distinction between the substations by osmTGmod abstraction and those filtered for the developed model was necessary for a comprehensive allocation of generation and demand. The mentioned number of 3,702 distinct substations consists of 3,608 HV-MV substations and 94 sole EHV substations. A validation of these transition points and their surrounding grid districts was carried out in [10]. It was shown that the method to identify these areas based on OSM and to allocate generation and demand is applicable and leads to realistic results. Since there is  [13]. It became evident that the eGo grid model covered every single line at the HV and EHV-levels included in the maps provided by the operator.

Assumptions on electrical properties of the grid topology
Electrical proporties had to be assumed and assigned to the generated grid topology in order to perform power flow analyses. Assumptions on lines and transformers were particularly relevant in this context. The assignment of electrical parameters to the topology is part of the tool osmTGmod.
Within each voltage level one standard overhead line and one underground cable was defined representing typically used assets in Germany (see Table 3).  Table 3: Electrical parameters of standard overhead lines and underground cables for the EHV and HV level. Source: based on [14] In order to assign and adjust the adequate standard information to the OSM based power lines, information about voltage level (OSM-key voltage, V nom ), length (implicitly derived from the geometry, l osm ) and number of conductors (OSM-key cables, cables osm ) of the OSM topology was used (see also [15,4,1]). The number of circuits (or systems) of a route between two grid buses is calculated by dividing the OSM-key cables by three (n circuits = cablesosm 3 ). This assumption was made because of the exclusive existence of a three-phase AC system in the public grid of Germany and Europe [16]. The resistance R, the reactance X, the capacitance C and the thermal limit apparent power S nom for all lines were then calculated as stated in Equations 1 -4 (see [1]). Within these equations the standard per unit length values (R lit , L lit , C lit ) as well as the circuit-specific S nom,lit , which are based on Table 3, were utilized. Consequently, the angular frequency (ω = 2 · π · f ) corresponding to f = 50Hz was considered.
Modelling all possible interconnections between the voltage levels, three representative transformers were defined (see Table 4). Whenever lines of different voltage were connected at one OSM bus, two buses for each voltage were created and connected by a transformer. The nominal capacity of the transformer was calculated by the minimal sum of S nom,line of all power lines of the same voltage. Thus, the transformer is not expected to be the bottleneck in power flow simulations [15]. The calculated nominal capacity defined the number of transformers installed with respect to the specific S nom in Table 4. The impedance Z of each transformer which was assumed to be equal to the reactance X [17] was calculated as defined in Equation 5.
The utilized standard values for the relative short circuit voltage (v sc,lit ) as well as the higher (V a ) and lower nominal voltages (V b ) were based on Table 4.  The assumptions on the definition of different control strategies at the buses were based on [19]. As usual, the bus connected to the largest generation throughout the year was selected as the slack bus. The slack generation has to be realized by a large and flexible power plant. Large power plants (P nom ≥ 50 MW ) were defined as PV generator characterizing their connecting buses as PV buses. PQ nodes were buses, which loads or small power plants (P nom < 50 MW ) are connected to. Joints without generation or consumption were also modelled as PQ buses.

Characterisation of status quo and future scenarios
Three scenarios were defined and used for the intended power flow simulations. Apart from one status quo scenario representing the German electrical energy system in 2015, two future scenarios were defined employing exogenous assumptions. For Germany, the installed generation capacities of the status quo scenario were taken from the power plant list of the Open Power System Data project [20,21] (State: 01-01-2016). Whereas the 2035 scenario is based on publicly available information and methods of the Netzentwicklungsplan (NEP) Strom 2025, erster Entwurf [22]. Out of several NEP scenarios, the so-called "B1-2035" was chosen; it is characterized by a high RE expansion and an increased share of natural gas [22]. The third scenario pictures a future electrical energy system powered to 100% from RE and is mainly based on the 100% RES scenario of the e-Highway2050 -Modular Development Plan of the Pan-European Transmission System 2050 [23]. In order to build a 100% energy system in Germany 13 GW of gas fired power plants were removed (see: [24] and [25]).  A few general characteristics were assumed to be the same in all three scenarios. This includes the year used as a source for data on weather and demand, which is the year 2011 (compare [22]). In terms of weather, the year 2011 can be regarded as a moderate wind feed-in year in Germany [27]. Due to the weather-dependency of electrical demand, one and the same year was chosen as the source for data on weather and demand. Different to the original sources of the scenario parameters, all three scenarios were assumed to have the same load conditions (see Table 5).
We used historical hourly reanalysed weather data from CoastDat-2 [28] to generate temporal and spatial resolution input data for the renewable generation time series (see Section 6.2). The spatial focus of all three scenarios is Germany and its electrical neighbours (AT, CH, CZ, DK, FR, LU, NL, NO, PL, SE). The definition of installed capacities in the neighbouring power systems for the status quo and NEP 2035 scenario was taken from the Scenario Outlook & Adequacy Forecast (SO&AF) scenario B of 2014 at the reference time point of the annual peak load [29,30].
The marginal costs are central values to predefine the market decision on operation by the merit order in the Linear Optimal Power Flow and the market simulation of renpassG!S [31]. The marginal costs were calculated as described in Equation 6.
The CO 2 costs for the status quo scenario are set to 5.91 EU R/t CO 2 (EEX Germany mean value 2014) and assumed to be 31.00 EU R/t CO 2 for NEP 2035 [22] and 62.05 EU R/t CO 2 for the eGo 100% scenario [32]). All other assumptions and calculations in order to simulate the three scenarios are fully documented and can be found in [25].  Apart from the exogenous scenario parameters already mentioned, standard power plant parameters such as onshore wind power plant size, height or efficiency factors were defined for the respective generator types and subtypes in Section 6.2.

Allocation of loads and generators
The following section describes the spatial allocation of generators and loads to the predefined grid topology. Since information on generation and demand affecting the grid is compulsory, generators and loads needed to be assigned to the respective grid level and the relevant substation within this grid level. Geographic catchment areas, which differ according to the considered grid level, were the basis for the spatial assignment to substations. The catchment areas of EHV substations were defined by a Voronoi partition, a well-known approach used in different publications [3,33]. In [10], this approach is expanded using administrative borders as an additional input for the representation of the underlying MV grids.

Load allocation
Data on the electricity consumption throughout Germany for the reference year 2011 is available on the level of the federal states [34]. This spatial resolution does not meet the requirements of the intended powerflow simulations. Therefore, an allocation of the German electricity demand was implemented. The approach used to distribute and allocate loads with a high spatial resolution is described in detail by [10]. In this approach, information about the spatial distribution of population, the gross value added and the industrial and retail area functioned as the base for a spatial allocation of electricity demand. The assignment of distributed loads to their relevant bus was achieved using grid districts as a representation of the underlying MV grid for loads connected to the lower grid levels. Voronoi cells were applied for the allocation of industrial large scale consumers directly connected to the EHV level.

Generation allocation
To link data on the electricity generation with our grid topology, we derived information on power plants from two different registries for the status quo scenario -the conventional power plant list [20] containing large, mainly fossil power plants > 10 MW and a renewable power plants list [21]. All records in these registries are georeferenced. In the renewables power plant registry the minimum geographical accuracy is the maximum distance of a post code area. The conventional power plant data were manually georeferenced within the OPSD project using their addresses. We assigned generators to a voltage level, followed by a geographical allocation to the respective substation based on its catchment area. In our model, all offshore wind parks were connected to the EHV level. Information on the net connection points of present and future offshore wind parks are available in [35] and used to assign wind offshore units to their respective substation in the developed grid model. For future wind parks, which are according to [35] connected to planned substations not represented in the grid model, an existing nearby substation was chosen.
For the NEP 2035 scenario, information on conventional power plants were extracted from [36] and have automatically or manually been georeferenced. From this source only the large hydro and pumped storage plants are included in the eGo 100% RES scenario.
Beside conventional power plants individually listed in the aforementioned source, aggregated future capacities need to be distributed geographically to meet the need for a high spatial resolution. For the NEP 2035 the expansion of installed capacity per technology and federal state in Germany served as the starting point. Whereas the allocation for the 100% scenario was based on the assumption of installed capacity on country level. The methods and assumption of the spatial allocation differ depending on the type of RE generation.
For geothermal power plants no changes were assumed for future scenarios. The development of biomass, small gas and hydro power plants was expected to increase proportionally to its specific value of installed capacity based on the status quo in 2014. Due to the high development of small CHP plants within the NEP scenario, a geographical allocation method of the additional plants was developed and used [37].
The location of wind offshore units in the NEP 2035 scenario was taken from the Netzentwicklungsplan Offshore 2025 (O-NEP) [35,38]. New wind onshore and photovoltaic systems were as well assumed to see a proportional development starting from the status quo. These plants were allocated on the municipality level which is sufficient for analyses of the HV and EHV grid. The initial information was the number and size of renewable power plants per municipality. Based on this, representative units were defined for each municipality and were then proportionaly extended in future scenarios. The location of the new power plants is the centroid of the respective municipality. A more detailed allocation process which becomes necessary when addressing low and medium voltage levels is part of [8].
The allocation for the 100% scenario data is based on the same methods and its scenariospecific assumptions of installed capacities per technology. The amount of power plants increases from the status quo scenario to the 100% scenario especially due to the growing number of renewable power plants which affects the complexity of the analyses per scenario.

Time series of load and generation
Calculations on the grid model require a suitable characterization of electricity demand and supply at grid nodes. This was achieved by describing demand and supply characteristics with time series of one year, which allows for a detailed analysis of the electricity system with a focus on grid operation. This data must reflect generation and demand of a certain area to which the grid node is associated.

Demand time series
The aim was to obtain a consistent set of demand data valid across all grid levels in Germany.
Numerous approaches exist to model residential electricity load curves [39]. These range from bottom-up modeling of residential load curves considering behaviourial factors [40] to electricity load curves of German residential sector at high resolution [41]. The mentioned approaches are rather problem-specific and do not provide a complete data set for all sectors considered in this study (residential, industry, agricultural and retail). A comprehensive modeling approach for the modeling of electricity demand across grid levels is not available either. This involves an additional challenge as the coincidence of electricity demand varies across grid levels respectively aggregation levels. The shape of a load curve depends on the number of customers represented. The shape of a cumulative load curve of more than 1,000 customers does not change anymore [42]. In Germany, electricity demand characteristics at low-voltage grid level are usually described by standard load profiles (SLP) [43]. This information is provided according to sector (residential, retail, agricultural) and at a temporal resolution of 15-minutes. The data is based on measurements of 332 households. The authors state this is valid as a cumulative load for a number of customers with similar characteristics -without providing any quantity [44]. Willis and Scott argue that already the cumulative demand curve of 100 households "...looks smooth and 'well-behaved.'" [42]. Thus, we assume SLP are valid in a range of 100 to 300 residential customers. Additionally to demand patterns for these three sectors, a pattern for electricity demand of the industrial sector was constructed based on a stairs function [45]. The industrial demand pattern considered peak and off-peak times. During a work week at day-time (6 a.m. to 10 p.m.) the normalized load curve adds up to 0.8. At other times this parameter was set to 0.6.
Spatially highly resolved and sectorally disaggregated annual consumption C j,s (described above) was converted to hourly demand time series P • demand,j by applying sectoral demand patterns P • SLP,s (t) (see Equation 7).
where s refers to the sectors residential, retail, industrial and agricultural and j denotes the substation. We assumed a power factor cos φ of 0.95 (inductive) for aggregated loads at the HV substation [46]. Annual consumption, originally determined for each load area [10], was aggregated to HV grid nodes. 8

Generation time series
For each scenario, generation time series are based on hourly global power market simulation of Germany and its neighbouring countries. The open source model renpassG!S, which is an application of the Open Energy Modelling Framework (oemof) and the oemof library Feedinlib, was applied for this [31,48,49].

Application of the data model in power flow calculations
The data processing methods described in Sections 2 to 6 provide the necessary basis for power flow simulations and optimizations in a high temporal and spatial resolution. Exemplary results of these analyses are presented in this Section. The open source tool PyPSA 9 , which was integrated into the eTraGo app 10 , was the basis for the simulations and allows several power flow methods. Besides standard PF and LPF simulation methods a LOPF method optimizes power plant dispatch considering network constraints. The grid data model is suitable for all of the methods mentioned above, the analyses and results presented in this contribution are, however, based on LOPF. The LOPF additionally allows investive optimization of capacity extension, which will be a major part of future open eGo modelling, but is not considered in this work. Out of the three scenarios defined (see Section 4), the focus will be on the status quo and NEP 2035 scenarios in the following. The exemplary results are based on a 24 hour-optimization of the day with the lowest overall residual load of the modelled year (7th of April). Therefore the share of renewable power feedin is at its highest [50]. Estimating approximately the n-1 security requirements the maximal possible branch capacities were globally derated to 70% of the individual thermal limits [51].
For the sake of problem feasibility 11 at each bus load could be shedded considering a value of lost load of 10,000 Euro/MWh [52]. The reasons for the feasiblity problems are currently being researched. In this context, the high number of lines and buses represents a challenge. Analyses showed that only at a small number of buses (0.6 %) located in the 110 kV level load shedding occurs. These critical spots are located mostly in big cities such as Berlin and Munich where the degree of underground cable is rather high and therefore OSM mapping quality low. The poor mapping quality (concering the number of cables for example) displays the most probable explanation for these few local feasibility problems. Moreover the general assumptions on electrical parameters (concerning the nominal capacities, e.g the number of wires) or weaknesses in the allocation of loads and generators which could for instance result in too much load allocated to the 110 kV grid level are considered as possible sources.
The relative line loading of one hour for the status quo and NEP 2035 scenario can be examined in Figure 2. In the status quo scenario 392 power lines are loaded with more than 50 % compared to their thermal limits. In the future scenario, 747 power lines exceed this relative value. In the status quo scenario particularly the 110 kV lines in Schleswig-Holstein showed high usage rates. In contrast in the 2035 scenario the line loading increases in multiple regions whereas it decreases on many of the 110 kV lines in Schleswig-Holstein. The nearly omnipresent abundance of renewable power plants (particularly wind power plants) and supraregional grid restrictions (status quo grid was applied for 2035 scenario) cause these characteristics including substantial curtailment of RE. The modelled transformers do not show any critical congestions. The data model allows the calculation of 8,760 hours for each scenario year. In Figure 3, the stacked generation for one day within the 2035 scenario year is displayed. At  generation mix corresponds to the line loading shown on the right side of Figure 2. At this hour almost the entire load of Germany and its electrical neighbouring countries is covered by renewable generation. The nuclear power generation is originated from power plants located outside of Germany (e.g. France). The lignite and hard coal fired power plants can hardly operate during this midday period. During this day 36 % of the available solar and wind energy are curtailed. Due to feasibility problems, load shedding can be observed during daytime when demand is high. Its share in total demand throughout this day is with 0.4 % relatively low.
The results presented in this Section based on an exemplary day indicate that the open source grid data model allows extensive analyses of power plant dispatch under consideration of grid restrictions. The existing uncertainties regarding the problem feasibility and general complexity are essential parts of current and future research.

Conclusion and Outlook
An approach to create a model integrating the existing German EHV and HV grid was developed and implemented. The integration of the 110 kV grid displays an innovation in the field of open data grid models for Germany. At the same time the electrically neighbouring countries were abstracted considering their electrical and dispatch influence. The result is a plausible grid topology model with a high number of buses (11,294) and branches (19,605), which is a suitable basis for state-of-the-art power flow methods (PF, LPF and LOPF).
Completing the developed model, electricity demand and generation were defined for three different scenarios in order to derive results for the present, the mid and long-term. The allocation and aggregation of demand and generation units to each HV-MV substation (3,608) considering an interface to the MV and LV levels represents an adequate degree of spatial resolution. Per scenario hourly values were determined for each unit or aggregate over the course of one year. This allows research on dispatch and redispatch throughout an entire year considering realistic weather and demand simultaneities. Moreover, dispatch optimization for storage units which are characterized by substantial intertemporal dependencies can be realized.
The exemplary results in Section 7 showed that it is possible to calculate power flows based on the allocated generation and demand over a period of time. The spatial resolution enables the identification of decentral and central congestions of particular power lines within the same optimization problem. The weather-dependent behaviour of RE generation as well as the marginal cost driven dispatch optimization was shown to work conceivably.
The entire model (grid, generation, demand) has to be tested and validated in detail. The methodologies presented show high potential and produce convincing exemplary results. However, the results also disclose some difficulties. Minor load shedding at few buses was used to reach a feasible optimization problem. Recently grid operators' publications 12 were used to validate and vary line assumptions within the EHV level [53]. Therefore modifications can be deduced in the near future. In this context it is also discussed to utilize the OSM-tag wires, which can be applied on the EHV and HV level. As the tag wires represents the number of conductors per phase for a power line, an integration of this tag may lead to more detailed information on the transfer capacities. Neglecting the cross-border line capacities between the neighboring countries influences the model results (especially because loop flows are not considered). The implementation of such interconnections can be realized in the near future expecting an enhancement for the model. The further development is a continuous process, which is facilitated by the applied open source standards. All developed methodologies are reproducible and in this sense meet scientific standards to a high extend.
Based on the presented grid model future contributions will focus on the optimization of eventual investment decisions concerning storage and grid expansion. The spatial and temporal complexity and a high number of optimization variables lead to immense computation times when applying deterministic optimization methods. Accordingly complexity reduction methods (e.g. snapshot clustering) are currently being developed.