Building and Calibrating a Country-Level Detailed Global Electricity Model Based on Public Data

Deep decarbonization of the global electricity sector is required to meet ambitious climate change targets. This underlines the need for improved models to facilitate an understanding of the global challenges ahead, particularly on the concept of large-scale interconnection of power systems. Developments in recent years regarding availability of open data as well as improvements in hardware and software has stimulated the use of more advanced and detailed electricity system models. In this paper we explain the process of developing a first-of-itskind reference global electricity system model with over 30,000 individual power plants representing 164 countries spread out over 265 nodes. We describe the steps in the model development, assess the limitations and existing data gaps and we furthermore showcase the robustness of the model by benchmarking calibrated hourly simulation results with historical emission and generation data on a country level. The model can be used to evaluate the operation of today’s power systems or can be applied for scenario studies assessing a range of global decarbonization pathways. Comprehensive global power system datasets are provided as part of the model input data, with all data being openly available under the FAIR Guiding Principles for scientific data management and stewardship allowing users to modify or recreate the model in other simulation environments. The software used for this study (PLEXOS) is freely available for academic use.


Introduction
In energy systems literature, modelled global pathways limiting global warming to 1.5 • C generally meet energy service demand with lower energy use and significant electrification of energy end use [1,2]. These requirements signal a potential system transition in global electricity generation and the role of increased interconnection becomes an important question. Large scale modelling of continental power systems can facilitate a better understanding of potential pathways towards a zero-carbon supply of our future energy needs, yet to date research in this area is limited by a lack of detailed global electricity models [3].
Due to limitations in either computational complexity or data availability, electricity system modelling studies tend to make a tradeoff between the spatial scale of the study area and technical representation of power plant characteristics and transmission components. In modelling studies on a multi-country scale, a single node per country copperplate approach is generally applied [4][5][6][7] and technical properties such as turbine unit sizes, heat rates, and start-up costs [4,8,9] are usually represented in a standardized manner with uniform characteristics for every individual power plant of a certain type. This approach is acceptable for long term scenario studies where development of power plants and its technological characteristics are uncertain, yet for realistic assessments of today's electricity system a finer representation of the diversity in power plant-and electricity system characteristics is preferable.
There are a limited number of modelling studies assessing electricity systems from a global perspective. This can partly be explained because of the aforementioned issues, yet an additional factor is that generally the use of a global electricity model is seen as unnecessary and even impractical. Different to most other energy carriers, electricity to-date is produced and consumed domestically or exchanged between several countries within a region or continent. That said, the interest in the concept of long-distance electricity transmission and the potential evolution towards an interconnected global grid has gained significant traction in the last few years [3,10,11], resulting in a range of modelling studies on this topic [7,[12][13][14][15][16][17]. Other research utilizing global electricity models focuses on feasibility assessments of possible 100% renewable energy systems, without the utilization of low-carbon technologies such as nuclear energy, carbon capture and storage (CCS) [18,19] or even bioenergy [18].
In order to provide improved insights in the diversity of the worlds electricity system we developed 'PLEXOS-World', a detailed global electricity model capable of simulating over 30,000 existing power plants using public data. Although the issues of computational intensity and data access are still relevant, developments in recent years regarding faster computers, improved solvers and solving techniques [20], as well as relevant open electricity system data initiatives [21][22][23] have made this project possible. An assessment by Pfenninger and colleagues of the use of open data and software within energy policy research indicates that it generally lags behind other fields of research [24]. Extended efforts are being made for this study regarding this gap by means of showing the potential of open power system data as well as openness of model. The PLEXOS-World model is openly accessible for any PLEXOS user, with the software being freely available for academic use. The model in raw data format and all model input data is openly available and can be retrieved from the supplementary datasets [25], allowing users to modify or recreate the model in other simulation environments.
In this paper we describe the process of building a detailed global electricity model at plant-and country level. Section 2 includes the methodology, full overview of the data inputs and any made assumptions. A benchmarking exercise of calibrated simulation results with historical emission and generation data to secure accurate model performance is included in section 3. The paper concludes in section 4 with a discussion of the findings, the existing limitations and data gaps and an outlook on possible future work based on the developed model.

Data input and methodology
This section introduces the software used to simulate the global electricity model, describes the main methods and assumptions and gives a full overview of the input data.

Unit Commitment & Economic Dispatch model
The software used in this study to solve the Unit Commitment & Economic Dispatch (UCED) problem in the global electricity model is PLEXOS. PLEXOS is a transparent electricity system modelling tool used for electricity market modelling and planning. Detailed linear equations can be queried, modified and viewed by the user to facilitate a deeper understanding of model dynamics. The equations as applied for this study can be found in section 1 (S1) of the supplementary material [25]. All data input is fully customizable and the tool facilitates use of a range of open source (GLPK, SCIP) and commercial (CPLEX, Gurobi, MOSEK, Xpress-MP) solvers depending on preference and accessibility to licenses. PLEXOS comes with a fully build-in user interface enabling data management, model building and simulation all to be done within, yet also supports automation of data flows and model simulation from outside the user interface by means of COM or .NET. The software package comes with detailed documentation of all features. Modelling can be carried out using Mixed Integer Linear Programming (MILP) that aims to minimize an objective function subject to the expected cost of thermal and renewable electricity dispatch and a range of technical constraints. It is also possible to select Linear Programming (LP) for the model simulation to limit the computational complexity, albeit with lower detail in technical parameters. In the default setup of the software each time step is modelled in sequence and is linked to the previous for initial conditions. PLEXOS also provides the option to perform model simulations in a parallel fashion, meaning that otherwise chronological time steps can be simulated at once while spread out over multiple cores after which results are 'stitched' back together. This approach has the advantage of optimized utilization of computational resources with the trade-off being reduced accuracy considering cross-period parameters (e.g. number of online generator units) are not being tracked between steps. A comparison in the runtime performance between both approaches in context of PLEXOS-World can be found in Table 1. For the simulations in this study we applied MILP with linked time steps for optimal accuracy.
The objective function of the model includes operational costs, consisting of fuel costs, start-up costs consisting of a fuel offtake at startup of a unit and a fixed unit start-up cost. Penalty costs for unserved energy and a penalty cost for not meeting reserve requirements can also be included in the objective function. Fuel consumption is calculated using piecewise linear functions based on the generator heat rate. System level constraints consist of an energy balance equation ensuring supply meets the regional demand at each simulation period. Water balance equations ensure water flow within pumped storage units is conserved and tracked. Constraints on unit operation include minimumand maximum generation, minimum-and maximum up and down time and ramp-up and ramp-down rates. A zonal pricing methodology is applied with an assumed perfect market across the globe without consideration of market power or competitive bidding practices. A large number of open energy models are available covering different energy sectors and varying geographical regions. 1 PLEXOS-World's configuration is similar in set-up to other UCED models (for example Dispa-SET), but has a simplified representation of cross border transmissions by making use of Net Transfer Capacities (NTC).

Spatial and temporal representation
PLEXOS-World covers the electricity systems of 164 countries, subdivided into a total of 265 nodes. Larger countries, both in terms of size as well as relative electricity demand, are spread out over multiple nodes allowing for the integration of regional diversity as well as time-zone differences. This is the case for Australia (7 nodes), Brazil (10 nodes), Canada (9 nodes), China (34 nodes), India (5 nodes), Japan (6 nodes), Russia (7 nodes) and the United States (24 nodes). Subdivision of nodes is generally based on geographical borders, operating areas of different authorities or following the availability of data. See Fig. 1 for an overview of the nodal representation in PLEXOS-World and S4 of the supplementary material [25] for a full list of nodes. S2 of the supplementary material can be consulted for more details on the approach of sub-country division of nodes and data.
The model is setup to run for the 2015 calendar year, with customizable timesteps adjustable for the aim of the study and the size of the simulated model. Typically, two-hourly, hourly or 5-min intervals are used. 2015 has been chosen as simulation year due to restrictions on data availability for more recent years. Continents and nodes can be manually selected or deselected based on the user's preferences, keeping in mind that changing the spatial or temporal resolution can significantly affect the computational intensity of the simulation. Hourly simulations are generally sufficient to get a basic understanding of the optimal UCED, yet to incorporate ramping constraints of generator units or to assess aspects such as system inertia sub-hourly modelling is advisable [26]. The input data for demand-and variable renewables (VRES) time-series are based on hourly patterns, yet the software linearly interpolates data values in case sub-hourly modelling is required. Hourly intervals are used for the simulations in this study based on daily time steps with a 6 h look-ahead.

Technical representation and input data
The model draws solely on public sources of information for input data. The sources and accompanying assumptions for this study are introduced in the next sections. Fig. 2 gives an overview of the different steps within the modelling process as well as for the different sources and their interrelationships with the data inputs. The steps and data as used for the calibration exercise are also shown. Note that the data in the model is from best available public sources, but users of the model have freedom to change and edit any data if more advanced local or sitespecific data is at hand.

Power plant portfolios
The World Resources Institute (WRI), in collaboration with the Global Energy Observatory, Google, KTH Royal Institute of Technology in Stockholm and Enipedia, has made extended efforts to create the first open access Global Power Plant Database covering more than 85% of global capacity [21]. The WRI database differentiates power plants per fuel type and has integrated geolocations. It has been used as the primary source for power plant capacity data for PLEXOS-World. Approximately 55% of power plants in the WRI database have a commissioning year attached. For the remaining 45% it is unclear whether these power plants were already operational as of 2015. Power plants for which it is known that they became operational after 2015 are incorporated in the model yet are 'turned off' (units are set to zero) for simulations of the 2015 calendar year. The geolocations were used to allocate power plants to the relevant nodes. Fig. 3 shows a visualization of the power plant data with the height of the bar indicating the relative capacity size. This visualization does not only reflect the differences in density of power plants between regions, but also highlights the data gap of the missing 15% of global power plant capacity. The coverage in developing regions, as well as countries such as China, India and Russia is not fully exhaustive. Furthermore, wind and solar coverage is limited due to the more decentralized nature of these technologies. The remaining power plant capacity not accounted for in the WRI database has been incorporated using standardized generators per country and per technology based on a number of quality sources such as the EIA [27], ENTSO-E [28], IEA [29,30], IRENA [31] and India's Central Electricity Authority [32]. For smaller countries where no diversified fossil capacity data exists within the above sources, it is assumed that the relative share of coal, gas and oil capacity per country within the WRI database can be used to scale up to the reported aggregate fossil capacity as indicated by the EIA [27]. Due to a lack of sub-country capacity data for especially China, Japan and Russia, it is assumed that missing capacity in these larger countries can be spread out relative to the share of existing capacity per technology per sub-country node in the WRI database.
Power plant capacity data in the WRI database is supplied in an aggregate format without differentiating individual turbine unit sets per power plant. To be able to incorporate generator characteristics such as minimum stable levels, ramp rates and to assess system inertia contributions it is important to disaggregate the power plant capacity data into individual units. This is done by utilizing a standard unit size methodology per fuel type as applied in earlier studies [5,13,33], both for the WRI database data as well as for the missing capacities, with the standard turbine unit sizes per generator type indicated in Table 2. Other renewable power plants such as solar and wind power plants, as well as all other storage technologies other than Pumped-Storage Hydro (PSH), use the capacities as given by the different databases. Note that Concentrated Solar Power (CSP) to-date is not included as a separate power plant type because the WRI database does not differentiate between different solar technologies. It has been assumed that gas power plants in the WRI database with a capacity <130 MW represent open cycle gas turbines (OCGT) and vice versa >130 MW combined cycle gas turbines (CCGT). The number of units per power plant U (rounded upwards) can be calculated with (1), with MWt being the total nameplate capacity of the power plant and MWst the standard unit size of the relevant technology. Consequently, the MW capacity per unit C equals (2).
Generic relationships have been derived based on historical power plant data to calculate generator specific heat rates and start costs depending on the capacity per turbine unit. By using the constants SCa and SCb as included in Table 2, the specific start cost SC per unit C can be calculated with (3). These characteristics are modifiable by users and available as part of the model input data.
Similarly, the generator specific heat rate can be calculated with (4), by using the constants HRd, HRe, and HRf.
In unconstrained model runs, baseload power plants such as coal (2015 context with higher gas prices), nuclear, biomass and geothermal are over utilized compared to historical data. In real life, generators can be limited in their operation due to a variety of factors such as outages, maintenance, limitations in fuel supply or through policy-based constraints. Data regarding restrictions in operation at power plant level are not available within the public domain, hence for these baseload technologies we incorporated operational constraints specified per country and technology which forces generator units to be 'turned off' for part of the simulation horizon. IEA's 'Electricity Information' [30] provides insights in generation values for 2015 per country and fuel type. The difference between these values and the combined power output of all power plants per country and fuel type in the unconstrained model run can be used as indicator for the initial size of the required operational constraints. Through an iterative process with model simulations, these initial values have been calibrated up or down until further change negatively impacted the match with reported historical generation.

Renewable profiles
The supply of electricity from hydro, solar and wind is determined using location specific capacity factors (CF). The Renewables Ninja database [23] has been used to extract hourly CF profiles for every onand offshore wind (5187 in total) and solar (5929 in total) power plant location in the WRI database by making use of the geolocations. The profiles are developed by making use of NASA's MERRA-2 global reanalysis data [34]. The current set of profiles are based on the 2015 meteorological year, future updates of the model will include a wider range of data years considering that weather patterns can have significant impact on the operation of electricity systems, especially with increasing VRES integration [35]. Standardized solar-and wind power plants integrated to scale up missing capacities within the different nodes make use of an averaged profile based on all CF profiles from within that node. For nodes where no wind or solar power plants exist within the WRI database, a sample of between 4 and 8 patterns per node spread out over its respective geographical area have been manually extracted from the Renewables Ninja database.
Initial model simulations indicated that the overall generation of solar and wind per node as a result of the integrated CF profiles was in some cases significantly overestimated compared to historical generation data for 2015 as reported by IRENA [36]. As shown by the authors of the Renewables Ninja database [37], use of the database in particular for regions outside the EU requires bias correction. For this reason, we've applied country-level multipliers to the hourly profiles to calibrate overall generation from solar and wind in the model with historic 2015 data.
Due to the size of the model, hydro other than PSH is modelled in a simplified manner without actively simulating the use of (cascaded) reservoirs. Location specific monthly CFs for every hydro power plant (7155 in total) are developed by making use of the Global Reservoir and Dam Database (GRAND) [38] and a study by Gernaat and colleagues [39]. In this latter study, the authors identified over 60,000 potential new locations for hydro power plants and developed monthly water discharge profiles for every new location, as well as for every existing location as identified in the GRAND database based on 30-years of runoff data. The geolocations of the hydro power plants from the WRI database are matched with the nearest dam from the GRAND database, with every plant above 1 GW matched manually to secure accuracy. The coverage of the GRAND database for dams above 58 latitude is limited, hence for hydro power plants in the Scandinavian countries of Iceland, Finland, Norway and Sweden we use country average profiles as used for earlier studies assessing the European electricity system [5,13,35]. For the northern parts of Canada and Russia we use a country average fully based on GRAND data. The profiles for the standardized hydro power plants used to fill gaps in power plant capacities within the WRI database are based on an average of all profiles of the specific node. Countries   Fig. 3. Visualization of the power plant data of the WRI database [21]. Relative height of the bar is an indicator for the capacity of the specific power plant. without hydro power plants in the WRI database, yet with mentioned capacity following EIA data, are assigned an average profile from a neighboring country. Following [39], the design discharge of hydro turbines is assumed to be based on the 4th highest discharge month in the discharge profiles meaning that during at least three months of the year spillage of water occurs. Base profiles for the month specific maximum capacity factor CFt per GRAND location can be calculated with (5), with Qd being the design discharge and Qt being the discharge of month t. Following on to that, to secure accuracy on the macro level, the individual profiles from (5) are scaled by comparing the calculated capacity weighted average CF per country with a country-level 15-year average CF based on historical capacity and generation data from IRENA [36].
Hydro power plants within the WRI database do not differentiate between types of hydro, being Run-of-River or reservoir-based systems. Early stage model simulations indicated that the generation potential for a large share of hydro power plants in months with high CFs was not fully utilized, whereas the occurrence of significant unserved energy in hydro dominated regions (e.g. Canada) in months with lower CFs indicated the importance of seasonal storage of water for these regions. To mimic the possibility of having a certain flexibility in cross-monthly storage of water for more dispersed generation of electricity, the original profiles were rescaled with (6) to fit within a narrower range of monthly values by calibrating the original min (min old ) and max (max old ) of the distribution of CF t s of the specific hydro power plant. The adjusted min (min new ) and max (max new ) values were determined based on an iterative process of model simulations with a hard upper limit set at 80% of the highest Qt of every individual profile. At all times, the capacity weighted average of the profiles within a country equal the 15year average country CF as identified with the IRENA data. As a last step specifically for this study, scalers have been applied in the calibration exercise to slightly in-or decrease the profiles for 2015 conditions again following reported country-level generation data from IRENA. All CF profiles as used for this study can be found in [25]. Hydro plants are constrained at a monthly level with the above profiles but are free to provide flexibility and balancing at hourly level.
Yearly CFs for Ocean, Tidal and Wave based power plants have been integrated based on [30]. No seasonality or variability has been included for these technologies to-date.

Storage
Large scale electricity storage to-date is mostly based on PSH, albeit integration of other storage technologies for balancing of VRES or other ancillary services is becoming more prominent. The US Department of Energy (DOE) Global Energy Storage Database 2 is a regularly updated database of operational and commissioned electricity storage projects. The DOE database provides rated power per project yet does not consistently include storage size (MWh) or charge and discharge efficiencies. Technology specific full cycle efficiencies are incorporated based on mean values from reported data in [40]. Similarly, indicative hours of storage values from the same study are used to calculate project specific storage sizes for all technologies apart from PSH. For approximately 130 of the PSH projects, mostly in Europe and the US, actual data on storage size has been retrieved through [41,42] as well as through individual Wikipedia pages as best indication. Based on this project data, a calculated average ratio (MWh/MW) between storage size and power rating for PSH of 18.9 has been determined after exclusion of outliers with a ratio above 200. This average ratio has been applied to all PSH projects where storage size data was missing. Altogether, the model incorporates over 1100 operational electricity storage projects, of which 323 PSH.

Hourly demand data
Availability of hourly public demand data for countries outside Europe and North America is limited. A common approach in electricity system modelling studies for regions outside these areas is therefore to use standardized profiles from other countries (mostly European) and adapt the profiles based on locational characteristics [12,15,43]. Extended efforts have been made to integrate a more detailed spatial representation within the demand data for this study. To-date, the model includes load profiles based on actual historical hourly data for approximately 50 countries and regional specific historical load profiles for 55 sub-regions. This includes data from geographically dispersed load centres around the globe such as Canada, the United States (US), Mexico, Brazil, Russia, South-Africa, Japan, South-Korea and Australia. The data portal of the European Network of Transmission System Operators (ENTSO-E) includes historical hourly load data for all EU member states, as well as for most non-EU countries connected to the European synchronous grid [44,45]. Data for Ukraine has been retrieved through direct communication with the national system operator (SE NPC Ukrenergo, 29-10-2018). A range of system operators or governing entities provide historical hourly load data on an individual (sub-) country level. A full overview of the existing publicly accessible hourly load data can be found in S5 of the supplementary material [25] with all global demand profiles as used for this study to be retrieved as a separate file also from [25]. Details on availability and development of hourly load profiles for all sub-country nodes can be found in S2 of the supplementary material.
Within the available historic data, differences exist that need to be overcome to retain uniformity in the input data for the 2015 model. Not all profiles cover the full electricity system of a country. As a best estimate for hourly demand in the respective country, we scaled the available profiles to 100% of 2015 electricity demand. Furthermore, not all available profiles are based on the 2015 calendar year, hence these profiles have been scaled and shifted to 2015 values. Shifting profiles is required to retain balance in weekdays and weekends while scaling profiles from year to year. Scaling of the hourly profiles occurs linearly with the difference in final demand between the reference year of the data and 2015 as proxy. It has been assumed that there are no changes in relative peak demand. Final electricity demand per country has been determined by multiplying consumption per capita data from the World Bank with the total population, combined with integrating country-level Transmission & Distribution (T&D) losses [46]. All in all, 28 countries did not have a value for electricity consumption per capita. These countries were assigned a value from the nearest neighboring country with similar GDP per capita. This was done manually to verify the consistency of data.
Countries without available historic hourly demand profiles have been assigned country specific synthetic profiles as developed by Toktarova and colleagues [47]. The authors constructed a calibrated method to generate demand profiles for future years based on locational economic, technical and climatic characteristics. Profiles as developed for 2020 are scaled and shifted to the 2015 calendar year. For a number of smaller countries for which no historical or synthetic profiles were available we assigned profiles from the nearest node with similar GDP per capita.

Net Transfer Capacities
Significant developments in the availability of open data regarding existing high voltage power transmission infrastructure around the globe has occurred in recent years [48,49]. Yet, no complete global dataset exists incorporating cross-border Net Transfer Capacities (NTCs). Hence, for the 2015 global electricity system model NTCs were retrieved through a variety of sources to fill this data gap. NTCs have been applied rather than modelling transmission infrastructure line by line due to restrictions on the availability of data as well as to set a limit on computational complexity of the model simulations. The values represent the technical potential for power flow and do not take into account possible geopolitical or market restrictions on utilization.
As part of a study on indicative scenarios of power plant investments based on potential for electricity trade in the African continent, Taliotis and colleagues [50] composed a dataset with all existing and planned NTCs between adjacent African countries. For the 2015 model we only incorporated the existing lines. The 'Comision de Integracion Energetica Regional' (CIER) published a report in 2016 on the current state of the energy systems within Central-and South-America, including an overview of the interconnectivity between countries with existing and planned power transmission projects [51]. Similarly, The World Bank analyzed the current power market structure and design of the electricity networks in the Middle East and Northern-Africa [52], and an overview of existing grid infrastructure for South-East Asia can be found in [53,54]. For reference NTCs between countries covered by the ENTSO-E we used the 2016 Ten Year Network Development Plan (TYNDP) as background [55]. Given 2020 values per border in [56] were taken while capacities from projects finished after 2015 have been excluded. Furthermore, the transparency platform of the ENTSO-E provides NTCs [57,58] and hourly exchange values [59] for the majority of pathways within Europe not directly covered by the TYNDP. Finally, a wide range of additional journal papers, reports and other sources contribute to a global dataset of existing cross-border and cross-regional NTCs as of 2015. This is included in S6 of the supplementary material [25], with table S6.1 showcasing NTCs per adjacent pathway as well as the references behind the values. S2 of the supplementary material includes a more detailed description of the approaches used regarding NTCs between sub-country nodes. Fig. 4 highlights the global cross-border transmission pathways with the highest existing NTCs as of 2015.
To-date, pathways with the highest NTCs are mostly used to facilitate supply of surplus electricity from hydro power plants to the power systems of neighboring countries. Examples are the Paraguayan part of the Itaipu plant mostly used to supply Southern Brazil and a range of hydro power plants in Mozambique which are being used to supply power hungry South-Africa. Looking passed these mostly unilateral flows, Europe is on the forefront of power system integration to a combined market reflected by the generally high cross-border transmission capacities.

Fuels and emissions
Fuel prices for standard commodities such as coal and gas were taken from BP Statistical Review as simplified annual prices at continental level [60]. These can be modified by users if more granular information is available. Oil as a fuel for power generation is most dominant in regions where there is high supply of the raw fuel, e.g. in the middle east and in countries such as Venezuela. As a result of the local availability, standard commodity prices for oil do not always reflect a realistic fuel price for the power sector in these regions. Multiple iterations in PLEXOS were used to calibrate country-level oil prices that resulted in power plant utilization close to 2015 reported generation values. Carbon pricing is currently not included to retain uniformity in the model for the different continents. To-date, a range of different carbon pricing mechanisms are applied in a number of regions around the world [61]. Power plants based on fossil fuels have limited differentiation in specific fuel types within the WRI database. To reflect the use of specific sub-categories of fuel groups within the different continents (e.g. bituminous coal or lignite) on the overall CO 2 emissions, continent specific ratios of CO 2 emission per unit of raw fuel (being coal, natural gas or oil) have been incorporated. These were calculated by matching 2015 generation and emission data per larger fuel group as reported by the IEA [29,30,62].

Model calibration and benchmarking
As described in earlier sections of this paper, part of the model input data such as renewable capacity factors, operational constraints of thermal power plants and fuel prices have been calibrated to secure  [25]. model accuracy. This has been done through an iterative process of comparing model simulation output with 2015 benchmark data and calibrating the input data accordingly. Model calibration is important as it allows users to judge the quality of the results against international benchmarks such as the IEA. Note that users of the model can ignore the calibration by turning off the specific calibration scenario and dialing back to the raw model input. However, we believe it is a helpful asset and gives a more realistic representation of the global power system.
The sources used for the benchmark and calibration are as follows. Annex A of the World Energy Outlook (WEO) [29] provides historical CO2 emissions from power generation for the different continents. Differences in geographical coverage per continent compared to PLEXOS-World (e.g. Turkey is part of 'Europe' within the WEO whereas in PLEXOS-World it is part of 'Asia') have been adjusted by removing or adding calculated country-level power sector CO2 emissions from or to the continental totals. These country-level values were calculated based on IEA's 'CO 2 emissions from fuel combustion' [62] which provides historical CO2 emissions per generated kWh per fuel type for a range of countries, multiplied with country-level generation data per fuel type from IEA's 'Electricity Information' [30]. [30] has also been used to calibrate generation values for most fuel types. Unfortunately, the report does not differentiate generation values for solar and wind and does not include data for all countries around the world. Hence for solar and wind as well as for other renewable technologies where country-level generation data is missing we used an additional dataset from IRENA [36]. Comparison of the benchmark data with simulation results based on the calibrated model input can be found in section 3.

Model availability
The full model (and its future updates) in raw data format as well as the input datasets for PLEXOS-World are available at [25] and we use the 'FAIR Guiding Principles for scientific data management and stewardship' for dissemination [63], allowing users to modify or recreate the model in other simulation environments. FAIR encourages the findability, accessibility, interoperability, and reuse of digital assets. The principles emphasize machine-actionability, in essence the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention because of the increasing reliance on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data.

Results
This section includes a benchmarking exercise in which the calibrated model simulation results of the over 30,000 simulated power plants are being compared to historical data with 2015 as base year. Benchmarking is undertaken at an aggregated continental and country level and not at plant level as this model is intended to allow users to examine large scale and continental power systems. Users have the option to downscale the spatial size of the model simulations yet would have to undertake their own calibration. Fig. 5 showcases a comparison between the overall generation and CO2 emissions on a continental and global level from the PLEXOS-World simulations with historically reported data. Main observations based on the graphs are that both the generation as well as the emissions are generally in line with reported data. Small deviations exist with the reported generation values, predominantly in Asia and Europe, which can be the result of a combination of factors.
First, the use of different datasets for input and calibration can lead to small yet insuperable differences. The overall demand for every country within the model, determining the required generation, has been based on World Bank data, whereas the reported 2015 generation values are based on IEA and IRENA datasets. Furthermore, although load shedding in mostly developing countries is not uncommon, limited occurrence of unserved energy (global total of 92.4 TWh on 24,000 TWh demand) in especially sub-country nodes indicates a possible limitation of the assumption of relative distribution of missing power plant capacities based on the existing share of capacity per sub-country node within the WRI database. It is likely that as a result of said assumption slight underestimation of power plant capacity in a specific sub-country node can occur in favor of another and vice versa. Yet, due to a lack of openly available robust datasets including sub-country level power plant capacities the current approach is near optimal.
Finally, besides the technical potential for power flow, to-date there are no restrictions implemented in the model regarding trade of electricity between nodes which can lead to overestimation of flows and consequently underestimation of domestic generation. Current model results indicate a significant flow from European nodes to Asia (mostly Russia) contributing to the slight differences with historically reported data in both continents. Comparison of the overall continental emissions with reported data as shown in Fig. 5 indicates a similar story, values are generally in line, with small differences mostly as a result of the described differences in required generation.     6 shows a more detailed view on both aspects by comparing the historical and simulated generation and emission values per fuel type. More detailed graphs that include comparisons with total emission-and generation values per fuel type and continent can be found in S3 of the supplementary material [25]. The generation output of operationally low-cost technologies such as coal, hydro, nuclear, solar and wind has been calibrated at country level through an iterative process to come as close as possible to reported 2015 generation values. This has generally been successful, yet the earlier indicated differences in total generation leads in certain cases to a mismatch in the overall use of peaking power plants based on gas and oil compared to historically reported data. These power plants are generally at the end of the merit order (2015 context with higher gas prices), and hence dispatched last or switched off first making it most susceptible of all power plant types to changes in demand.
Next to an overall deviation in use of peaking power plants, there is also a slight mismatch in the relative use of oil versus gas in countries where both fuel types compete. The main reason for this mismatch is the approach used to scale missing power plant capacities based on relative influence of coal, gas and oil in the WRI database for countries where no capacity data is available in the IEA datasets. It is possible that the country-level power plant capacity of a specific fossil fuel is underestimated, meaning that the theoretical generation potential is insufficient to reach the benchmark values. The reason that this is especially visible in Africa is that relatively speaking Africa is underrepresented in the WRI database compared to other continents. Furthermore, to-date secondary fuels for thermal power plants are not incorporated in the model which affects the use of oil and gas.
These aspects are also visible on a country-level as indicated in Fig. 7. Utilization of gas and oil-based power plants is controlled by means of its fuel price, with oil prices calibrated at country-level to optimize the balance in use of both fuel types compared to historical data. Despite this, in certain cases oil is slightly underutilized in favor of gas and vice versa. Yet, it is important to realize that in absolute terms the role of oil for the purpose of power generation is very limited (see S3 of the supplementary material [25]). Overall deviations in the use of gas compared to the benchmark values are mostly as a result of lower or higher required generation in the model. The underutilization of oil in India results from data discrepancies in the different datasets. The IEA reports a gross electricity production from oil in 2015 of almost 23 TWh [30], whereas the diesel-based installed capacity according to India's Central Electricity Authority in March 2015 was 1.2 GW [32] and in March 2016 only 0.99 GW [64]. Even at full utilization this would lead to a maximum generation potential of 8.7-10.5 TWh. The relatively low usage of gas in China is a direct result from the earlier described limitations in sub-country allocation of generator capacities as well as a slightly lower total demand compared to benchmark generation values. That said, the role of gas for power generation in China is limited compared to other fuel types. Beyond gas and oil, the graph shows that country-level total generation as well as generation from baseload-and other low-cost technologies is generally in line with historical generation values.

Discussion
This paper describes the model development of a first-of-its-kind reference detailed hourly global power system model at plant and country level. The modeldubbed PLEXOS-World after the simulation software usedcan simulate the dispatch of over 30,000 individual power plants representing 164 countries spread out over 265 nodes. Alongside the existing storage facilities around the world as well as the globally existing cross-border transmission capacities, the model optimizes the supply of electricity to match the system demand by minimizing the overall operational system cost.
We've shown that the model can be a useful tool for the simulation of the global power system through a benchmarking exercise of calibrated simulation results with historical data for 2015. That said, the model is as strong as its input data and the underlying model assumptions. Significant improvements can still be made, for example regarding the representation of existing power plant portfolios, the level of spatial detail in aspects such as fuel-and carbon prices and by incorporating a wider range of data years for demand-and variable renewable profiles [35]. The main strength of the model is therefore not in its absolute accuracy but in its openness, adaptability and flexibility for other users. All model input is available as supplementary material [25] to allow other users to modify the model in PLEXOS or recreate the model in other simulation environments. This includes a full global dataset of cross-border transmission capacities, hourly demand profiles, and plant-specific capacity factor profiles for existing hydro, solar and wind power plants. The model can be used for assessments on the global scale, but it is as easy to zoom in on a specific country or area in the world allowing it to be used for a wide range of research. The model is setup in a straight-forward fashion that makes it easy for users to switch to more accurate and detailed data for specific regions while modelling other areas with base data (or exclude completely).
The study has given us some valuable insights in the availability, importance-and strength of open data initiatives [24]. Nonetheless, it has also highlighted the still existing data gaps in especially areas outside Europe and North-America as well as the general difficulty of dealing with data discrepancies while using multiple large datasets. The study also showcased the clear differences in power plant portfolios and overall power system characteristics in different parts of the world. This latter aspect highlights once again that there is no single uniform pathway in the energy transition and decarbonization of the global power system, fueling the importance of modelling tools like PLEXOS-World to support research in this area.
In future research, the model will be used as a reference model based on which a range of global decarbonization pathways will be assessed. For example, advanced analyses of the concept of a globally interconnected power grid [3,13,17] will be conducted as well as the application of known soft-linking techniques [65] to investigate the technical feasibility of projected power systems in global scenarios as constructed by integrated assessment models. and Science Foundation Ireland (SFI) MaREI centre (12/RC/2302). Furthermore, we would like to express our gratitude towards Bruce Owen (Manitoba Hydro), David Gernaat (PBL Netherlands Environmental Assessment Agency), Gang He (Stony Brook University), Lynn St-Laurent (Hydro Quebec), Raman Mall (SaskPower) and UKNEGRO for providing operational electricity system data in context of this study. Finally, we thank Alparslan Zehir (University College Cork) for peerreviewing draft versions of this paper.