Selected ‘Starter kit’ energy system modelling data for selected countries in Africa, East Asia, and South America (#CCG, 2021)

Energy system modeling can be used to develop internally-consistent quantified scenarios. These provide key insights needed to mobilise finance, understand market development, infrastructure deployment and the associated role of institutions, and generally support improved policymaking. However, access to data is often a barrier to starting energy system modeling, especially in developing countries, thereby causing delays to decision making. Therefore, this article provides data that can be used to create a simple zero-order energy system model for a range of developing countries in Africa, East Asia, and South America, which can act as a starting point for further model development and scenario analysis. The data are collected entirely from publicly available and accessible sources, including the websites and databases of international organisations, journal articles, and existing modeling studies. This means that the datasets can be easily updated based on the latest available information or more detailed and accurate local data. As an example, these data were also used to calibrate a simple energy system model for Kenya using the Open Source Energy Modeling System (OSeMOSYS) and three stylized scenarios (Fossil Future, Least Cost and Net Zero by 2050) for 2020–2050. The assumptions used and the results of these scenarios are presented in the appendix as an illustrative example of what can be done with these data. This simple model can be adapted and further developed by in-country analysts and academics, providing a platform for future work.


Data accessibility
With the article and in a repository.

Value of the Data
• Can be used to develop national energy system models to inform national energy investment outlooks and policy plans and provide insights on the evolution of the electricity supply system under different trajectories. • Useful for country energy system analysts, policymakers and the broader scientific community as a zero-order starting point for model development. • Can be used to examine a range of possible energy system pathways, in addition to the case studies given in this study, to provide a further understanding of the evolution of the country's power system. • Useful for analysing the power system but also for capacity building activities. The methodology of translating the input data into modeling assumptions for a cost-optimization tool is presented in the appendix, which helps develop a zero-order Tier 2 national energy model [1] (source A) consistent with U4RIA energy planning goals [2] . • Useful for accelerating teaching activities, consultations, and government policy analysis in the energy planning field as evidenced by research that has been based on these data, including assessment of wind power in Morocco [3] , assessment of NDC targets in Ghana [4] , and assessment of decarbonisation pathways in Kenya [5] . • By combining secondary data from multiple, diverse sources, the work provides analysts with complete and accessible datasets, helping to overcome barriers of data inaccessibility.

Data Description
The data provided can be used as input data to develop an energy system model for the included countries in Africa, South America, and Asia. These countries are selected based on geography and data availability. This paper presents selected country-specific data and related aggregated data by region, with an example energy system model in the appendix; however, additional more comprehensive country-specific datasets are available externally for each country (see Appendix B for links to each available country-specific dataset, which should be consulted by those wishing to use these data for their own country analyses). As an illustration, these data were used to develop an example energy system model for Kenya using the cost-optimization tool OSeMOSYS [6] for 2015-2050. For reference, that model is described in Appendix A , and its data files are available as supplementary materials. The data provided were collected from publicly available sources, including the reports of international organizations, journal articles and existing model databases. The methods of data collection and preparation are described in Section 2 of this article and a separate article that provides guidance to those wishing to create similar datasets for other countries [7] . The data sources used are listed in Table 1 ; each data source is assigned a letter code which is then referred to in the text. The dataset includes the techno-economic parameters of supply-side technologies, installed capacities, emissions factors and final electricity demands.
U4RIA are practical goals designed to improve energy modeling for policy support through guidelines and best practices [2] . They are short for Ubuntu (meaning community focused), Retrievability, Reusability, Repeatability, Interoperability and Auditability. The datasets and example model move to meet U4RIA goals in that partially: Table 1 Data sources used in this article. In the text, lettered data sources corresponding to those in Table 1 Table 3 A table showing the estimated installed capacity of different on-grid power plant types in selected countries in East Asia in 2018 Table 4 A table showing the estimated installed capacity of different on-grid power plant types in selected countries in South America in 2018 Table 5 A table showing the estimated installed capacity of off-grid solar PV and hydropower in selected countries in Africa in 2018 Table 6 A table showing the estimated installed capacity of off-grid solar PV and hydropower in selected countries in East Asia in 2018 Table 7 A table showing the estimated installed capacity of off-grid solar PV and hydropower in selected countries in East South America in 2018 Table 8 A table showing techno-economic parameters for electricity generation technologies in Africa Table 9 A

Existing electricity supply system
Various technologies can be used to generate electricity, with some using fuels such as oil or natural gas, and others making use of renewable energy sources, such as hydropower. These electricity generation technologies can either be on-grid technologies, which are generally larger in capacity and supply electricity to the national transmission grid to be transported to consumers, or off-grid technologies, which usually provide electricity directly to the consumer at the site of demand, for example roof-top solar PV panels. The estimated existing electricity generation capacities, divided by technology, in each selected country in 2018 is detailed in Tables 1-6 below (sources B-E). The methods used to calculate these estimates are described in more detail in Section 2.1 . Data on the installation year of each power plant can be found in the country datasets published on Zenodo (see Appendix B ).

Techno-economic data for electricity generation technologies
The techno-economic parameters of electricity generation technologies by region are presented in Table 8, 9 and 10 , including costs, operational lives, efficiencies and average capacity factors. Two types of costs are considered here: capital costs, which are the initial investment costs for the electricity generation technology, and fixed costs, which are the fixed annual maintenance costs incurred when using the electricity generation technology, for example the costs of staffing the power plant or maintaining technical equipment. The efficiency of electricity generation technolologies is a measure of how much energy is lost in the conversion process to produce electricity, for example if a power plant is provided with two energy units of gas and produces one energy unit of electricity, with the rest of the energy lost as waste heat, the power plant would have an efficieny of 50%. Capacity factors are a measure of how often an electricity generation technology is producing over a given period of time, for example wind turbines are likely to have a lower capacity factor than gas power plants as wind turbines can only generate electricity when the wind conditions are suitable. Capacity factors for renewable technologies, including wind turbines, solar PV panels and hydropower plants, are dependent on their location as conditions vary with geography.
For countries in Africa, cost (capital and fixed), operational life and efficiency data were collected from reports by the International Renewable Energy Agency (IRENA) (sources F-H) and Table 8 Techno-economic parameters of electricity generation technologies in Africa (sources F, G, P).

Techno-economic data for electricity transmission and distribution
Transmission and distribution systems are used to transport electricity produced by on-grid electricity generation technologies, such as gas power plants, to sites of demand, such as homes and businesses. Transmission systems are used for transport over longer distances at higher voltages, while distribution systems transport electricity over shorter distances at lower voltages. The techno-economic parameters of transmission and distribution technologies are taken from The Reference Case scenario of The Electricity Model Base for Africa (TEMBA) (source Q) for countries in Africa. This gives estimated transmission and distribution efficiencies projected to 2050, and estimated costs and operational lives. The efficiency of transmission and distribution systems is a measure of how much energy is lost when transporting the electricity, for example as waste heat. For countries in Asia, combined losses in electricity transmission and distribution are estimated based on an International Energy Agency (IEA) dataset presented by Index Mundi (source R), which gives estimated combined losses in 2014. It was then assumed that combined losses would be reduced to 5% by 2050, falling linearly, due to assumed improvements in the technical operation of these systems and reduced non-technical losses, such as those due to power theft. The combined costs of power transmission and distribution are estimated based on a report by the Economic Research Institute for ASEAN and East Asia (ERIA) (source S), which gives cost estimates for several real-life projects in ASEAN. For countries in South America, the efficiencies and costs of power transmission and distribution were taken from the SAMBA dataset (source K), which gives estimated efficiencies by country, including projections to 2063. The estimated combined efficiencies of transmission and distribution in each included country are presented in the following tables.

Techno-economic data for refineries
Refineries are used to convert crude oil into useful fuels such as gasoline and diesel. Some countries have domestic refinery capacity, meaning they can process domestically-produced or imported crude oil, while others rely on importing oil-based fuels. Domestic refinery capacity in each country is sourced from the McKinsey Refinery Reference Desk (source T). In the example OSeMOSYS model, two oil refinery technologies were made available for investment in the future, each producing different ratios of Heavy Fuel Oil (HFO) and Light Fuel Oil (LFO). Heavy fuel oils are more viscous than lighter fuel oils such as gasoline. The techno-economic data for the two refinery technologies considered are shown in Table 23 .

Fuel prices
Assumed costs are provided for both imported and domestically-extracted fuels, with fuel price projections up to 2050 presented below. These are generic estimates based on an international oil price forecast (source V) and cost estimates for Africa (source G), Asia Pacific (sources W-Y), and South America (sources K, V, Z). A detailed explanation of how these estimates were sourced is provided in Section 2.2 .

Emission factors
Electricity generation technologies fuelled by fossil fuels emit several greenhouse gases throughout their operational lifetime, including carbon dioxide, methane, and nitrous oxides. In these analyses and data kits, only carbon dioxide emissions are considered. These are accounted for using carbon dioxide emission factors assigned to each fuel rather than each power generation technology. The assumed emission factors are presented in Table 27 .

Renewable and fossil fuel reserves
Tables 28-33 show estimated domestic renewable energy potentials and fossil fuel reserves respectively by country. Sources used for each region are described in Section 2.3 and can be found in the external country-specific datasets produced for each country (see Appendix B ).

Electricity demand projection
Final electricity demand projections from 2015 to 2020 are provided for each country. These projections estimate the future demand for electricity, considering factors such as population growth and industrial activity. For countries in Africa, demand projections were sourced from the reference scenario of the TEMBA study (source N). For countries in Asia, these were sourced from the Business as Usual (BAU) scenario of APEC's 7th Energy Outlook (source W), with growth rates for neighbouring countries and historic consumption (source AL) used to estimate future demand for countries not included in APEC. Demand projections for countries in South America were calculated based on the Current Policy Scenario regional demand projections of the OLADE Energy Outlook 2019 (source AM), which were divided by country based on historical consumption data from the IEA (source AL). For more information on the final electricity demand projection, see Section 2. The figures below show the final electricity demand projections by region for each selected country ( Figs. 1-7 ).

Experimental Design, Materials and Methods
Data were primarily collected from the reports and websites of international organizations, including the International Renewable Energy Agency (IRENA), the International Energy Agency (IEA), UN Stats, Asia Pacific Economic Cooperation (APEC), the Economic Research Institute for ASEAN and East Asia (ERIA), Latin America Energy Organisation (OLADE), and the Intergovernmental Panel on Climate Change (IPCC). Additionally, data were sourced from The Electricity Model Base for Africa (TEMBA) and the South America Model Base (SAMBA), existing OSeMOSYS models of African and South American electricity supply (sources K, Q).

Electricity supply system data
Data on the countries' existing on-grid electricity generation capacity were extracted from the PLEXOS World dataset (sources B-C) using scripts from OSeMOSYS global model generator (source AN). PLEXOS World provides data on the capacity and commissioning date of each power plant. These data were used to estimate installed capacity in future years based on the operational life data in Table 8,9 and 10 . Data on the countries' off-grid renewable energy capacity were sourced from yearly capacity statistics produced by IRENA (source E). Cost, efficiency and operational life data were collected from regional reports by IRENA and ACE and the SAMBA dataset for South America (sources F, G, I, K), which provide region-specific estimates by technology. IRENA's 2021 report focussing on Eastern and Southern Africa (source F) also provides projections of future cost reductions for renewable energy technologies. These future cost projections were used for African countries. At the same time, the trend for each technology was applied to the current regional cost estimates for East Asia and South America to estimate future cost reductions in these regions. For offshore wind, the cost reduction trend was taken from a technology-specific IRENA report on the future of wind (source H) instead since it is not featured in (source F). The resulting projections are presented in Table 11, 12 and 13 . It was assumed that costs fall linearly between the data points provided by IRENA and that costs remain constant beyond 2040 when the IRENA forecasts end (except for the offshore wind, where the IRENA forecast continues to 2050). Fixed costs for renewable energy technologies in each year were estimated by calculating a certain percentage (ranging from 1 to 4% depending on the technology) of the capital cost in that year, as done by IRENA (source F).
Country-specific capacity factors for solar PV, onshore wind and hydropower in all regions were sourced from Renewables Ninja and the PLEXOS-World 2015 Model Dataset (sources B, C, L, M). These sources provide hourly capacity factors for 2015 for solar PV and wind and 15-year average monthly capacity factors for hydropower. Country-specific capacity factors for offshore wind in Africa were sourced from the TEMBA dataset (sources N, Q), which provides capacity factor estimates for eight timeslices. For countries in East Asia and South America, country-specific capacity factors for offshore wind were estimated based on an NREL source that estimates the potential wind power capacity by capacity factor range in each country (source O), from which a capacity-weighted average was calculated. Average capacity factors are presented in Table 14, 15 and 16 . These data were also used to estimate capacity factors for eight timeslices used in the OSeMOSYS model (see detail in Appendix A ). Capacity factors for other technologies were sourced from reports by IRENA for Africa (sources F, G, J), IRENA and ACE for East Asia (source I), and the SAMBA dataset for South America (source K), which provide generic regional estimates for each technology.
The costs and efficiencies of electricity transmission and distribution in Africa were sourced from the TEMBA reference case (source N), which provides generic regional cost estimates and country-specific efficiencies which consider expected efficiency improvements in the future. For East Asia, the combined capital costs of electricity transmission and distribution are estimated based on an ERIA report which gives estimated capital costs for nine projects in ASEAN (source S), with an average value used. The fixed operational cost is assumed to be 2% of the estimated capital cost, as done by ERIA (source S). The combined losses of transmission and distribution in countries in East Asia in 2014 were sourced from IEA data (source R), and it was then assumed that combined losses would fall to 5% by 2050 in a linear fashion from 2014. For countries in South America, the capital costs, operational lives, and efficiencies of electricity transmission and distribution were also taken from the SAMBA dataset (source K), which provides future projections. Techno-economic data for refineries were sourced from the IEA Energy Technology Systems Analysis Programme (ETSAP) (source U), which provides generic estimates of costs and performance parameters. In contrast, the refinery options modelled are based on the methods used in TEMBA (source N). Existing domestic refinery capacities across all regions were sourced from the McKinsey Refinery Reference Desk, which lists refineries by country (source T).

Fuel data
For countries in East Asia, fuel prices for crude oil, diesel, fuel oil, natural gas and coal were taken from the APEC Energy Outlook 7th Edition (source W), which provides cost estimates by fuel from 2016 to 2050. APEC provide different natural gas and coal prices for net importers, exporters, and neutral countries, with the relevant prices used for each country. For countries in Asia, the domestic biomass price was estimated from an ERIA report that gives a local average in Thailand (source X) since this was the most region-specific cost estimate that could be sourced. The imported biomass price is an international average taken from a 2021 biomass markets report by Argus Media (source Y).
For countries in Africa, the crude oil price is based on a global price forecast produced by the US Energy Information Administration (EIA) in 2020, which runs to 2050 (source V). The price was increased by 10% for imported oil to reflect the cost of importation. The imported HFO and LFO costs were calculated by multiplying the oil price by 0.8 and 1.33, respectively, based on the methods used in TEMBA (source Q). The prices of coal, natural gas and biomass in Africa were sourced from a regional IRENA report (source G), which provides generic regional estimates for costs to 2030. Again, a linear rate of change was assumed between data points from IRENA, and the forecast was extended to 2040 using the rate of change between 2020 and 2030. Prices were then assumed constant after 2040. The cost of domestically-produced biomass was increased by 10% to estimate the cost of imported biomass.
For countries in South America, fuel price projections for crude oil were also taken from the same 2020 US EIA international oil price forecast (source V), with the prices for imported HFO and LFO calculated in the same way as for Africa described above. Each country's natural gas price forecast was taken from SAMBA, providing country-specific forecasts for 2063 (source K). The domestic biomass price was estimated based on a UK Government report on international biomass markets (source Z) that includes cost estimates for biomass production in Brazil. This cost was increased by 10% to estimate the price for imported biomass.

Emissions factors and domestic reserves
Emissions factors were collected from the IPCC Emission Factor Database (source AA), which provides carbon emissions factors by fuel.
For countries in Africa, domestic renewable energy potentials for solar PV, Concentrating Solar Power and wind were collected from an IRENA-KTH working paper (source AB), which provides estimates of potential yearly generation by country in Africa. Other renewable energy potentials for countries in Africa were sourced from regional reports by IRENA (sources G, AC, AO) and the World Small Hydropower Development Report (source AD), which provide estimated potentials in MW by country. Estimated domestic fossil fuel reserves for countries in Africa are from the websites of The World Bank and US EIA (sources AH-AI), which provide estimates of reserves by country.
For countries in East Asia, domestic solar PV and onshore wind potentials were primarily collected from an NREL report which provides estimated potential yearly generation with an LCOE under $150/MWh (source AF). For Asian countries not included in that report, the domestic solar and onshore wind resources were collected from other NREL datasets, which provide estimates of potential yearly generation by country (source O, AG). Offshore wind potentials were collected from the wind NREL dataset (source O) where applicable. Other renewable energy potentials in East Asia were sourced from regional reports (source AE, AP) and the World Small Hydropower Development Report (source AD), which provide estimated potentials by country. Estimated domestic fossil fuel reserves were primarily sourced from the APEC Energy Outlook 7th Edition (source W) or Worldometer (source AJ).
Domestic solar and wind resources were also collected from NREL datasets for countries in South America, which provide estimates of potential yearly generation by country (sources O, AG). Other renewable energy potentials were sourced from a regional report by OLADE (source AM) and the World Small Hydropower Development Report (source AD). Estimated domestic coal and oil reserves were sourced from the SAMBA dataset (source K), while natural gas reserves were sourced from the 2019 BP Statistical Review (source AK), which provide estimates of reserves by country.
For the minority of countries not included in one of the regional and global datasets described above, estimates of domestic renewable energy potential and fossil fuel reserves were extracted from country-specific papers and reports. Analysts wishing to use country starter datasets should consult the externally hosted data repository and country-specific preprint article (see Appendix B ) to elucidate exactly which source was used for each country.

Electricity demand data
The final electricity demand projections for countries in Africa are based on data from the TEMBA Reference Scenario dataset (source N), which provides yearly total demand estimates from 2015 to 2070 under a reference case scenario. Final electricity demand projections for countries in Asia are collected from the BAU projection from the APEC Energy Outlook 7th Edition (source W), with total demand estimates for every five years from 2015 to 2050, with demand assumed to change linearly between these data points. For Asian countries not included in the APEC Energy Outlook, a demand projection was estimated by applying the trend of the projections for neighbouring countries to the total demand in 2019 from the IEA (source AL). For countries in South America, the final electricity demand projections are based on the Current Policy Scenario of the OLADE Energy Outlook 2019 (source AM), which provides regional aggregated demand projections to 2040. These regional cost projections were divided by country using historical consumption data from the IEA (source AL) and extended to 2070 by extrapolating the growth trend to 2070.

Ethics Statement
Not applicable.

Funding
As well as support in kind provided by the employers of the authors of this note, we also acknowledge core funding from the Climate Compatible Growth Program (#CCG) of the UK's Foreign Development and Commonwealth Office (FCDO). The views expressed in this paper do not necessarily reflect the UK government's official policies.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.

Table A1
Definitions of the three model scenarios.

Fossil Future
No new investments in renewable or nuclear power generation, electric stoves and heating, electric transport or energy efficiency are permitted.

Least Cost
No new investment in nuclear power is permitted. Gradual investment constraints are applied to demand-side fuel-switching and energy efficiency. Only up to 5% of each technology's 2050 capacity run without demand-side investment constraints can be invested annually. No additional constraints are applied to find the cost-optimal solution. Net Zero by 2050 Domestic production and imports of fossil fuels and biomass gradually decline to 0 in 2050, beginning in 2021, leading to zero carbon emissions by 2050. No new investment in nuclear power is permitted. Gradual investment constraints are applied to demand-side fuel-switching and energy efficiency. Only up to 5% of each technology's 2050 capacity in a run without demand-side investment constraints can be invested annually from 2021 to 2039, rising to 10% from 2040 to 2050 to reflect greater ambition.

Demand-side assumptions
Generic techno-economic data for demand-side technologies (cooking, heating and transport) were used (sources AS-AT). The total final electricity demand projection was split by sector based on the proportions of demand in historical energy balance data (source AL). In each sector, moderate and high energy efficiency technologies were modelled, with input activity ratios of 1 and output activity ratios of 1.15 and 1.3, respectively. This is a simplified way of allowing the model to invest in energy efficiency in each sector, with costs estimated based on electricity generation costs by a coal power plant in the model. In the Least Cost and Net Zero scenario (detailed in Section A2), there is a constraint on the speed at which fuel switching and energy efficiency investments can occur to align results to reality better. This is done by limiting the annual investment in electric vehicles, stoves, heating technologies and energy efficiency to 5% of the 2050 capacity.

Time representation and discount rate
Within each model year, four seasons, each with two 12 h dayparts, are defined. Daypart 1 starts at 06:00 and finishes at 18:00, while daypart 2 starts at 18:00 and finishes at 06:00. The seasons are defined so that season 1 runs from December to February, season 2 runs from March to May, season 3 from June to August, and season 4 from September to November. A discount rate of 10% is used.

A2 Scenario definitions
Three stylized scenarios are modelled: Fossil Future, Least Cost and Net Zero by 2050. These scenarios are defined in the table below. Nuclear power is not considered in any of these scenarios; however, it can be added using the techno-economic data provided in the main article.

A3 Scenario results for Kenya
The graphs below show selected results for the three modelled scenarios, including yearly electricity generation and supply capacity, fuel use in the transport sector and total annual carbon dioxide emissions for 2020-2050.

A4 further work
These example results represented zero-order models and were generated using the clicSAND Interface [8] and OSeMOSYS code [6] . Those interested in further developing this work are di-rected to external comprehensive country datasets (see Appendix B ) and guidance on model development using clicSAND and OSeMOSYS [9] . Table B1 lists the country-specific datasets that have been created using the data described in this article. For each country, there is a Zenodo dataset which includes the data in a set of csv tables, and a Research Square pre-print article that describes the data collection process and provides stylised example scenarios created using OSeMOSYS. These can act as the basis for country-level analyses such as those on Morocco [3] , Ghana [4] , and Kenya [5] .

Table B1
External country datasets created using the data described in this article.