Dataset for multidimensional assessment to incentivise decentralised energy investments in Sub-Saharan Africa

In this data article, we present datasets from the construction of a composite indicator, the Photovoltaic Decentralised Energy Investment (PV-DEI) index, presented in detail in [1]. This article consists of a comprehensive energy-related data collected in practice from several sources, and from the outputs of the methodology described in [1]. The PV-DEI was designed and developed to measure the multidimensional factors that currently direct decentralised renewable energy investments. The PV-DEI index includes 52 indicators and was constructed because factors stimulating investment cannot be captured by a single indicator, e.g. competitiveness, affordability, or governance [1]. The PV-DEI index was built in alignment with a theoretical framework guided by an extensive review of the literature surrounding investment in decentralised Photovoltaic (PV), which led to the selection of its indicators. The structure of the PV-DEI was evaluated for its soundness using correlational assessments and principal component analyses (PCA). The raw data provided in this article can enable stakeholders to focus on specific country indicators, and how scores on these indicators contributed to a countries overall rank within the PV-DEI index. The data can be used to weight indicators depending on the specifications of several different stakeholders (such as NGOs, private sector or international institutions).

component analyses (PCA). The raw data provided in this article can enable stakeholders to focus on specific country indicators, and how scores on these indicators contributed to a countries overall rank within the PV-DEI index. The data can be used to weight indicators depending on the specifications of several different stakeholders (such as NGOs, private sector or international institutions).
© 2021 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ) Specifications Table   Subject Energy: Specific subject area Renewable Energy, Sustainability and the Environment Type of data Table  Figure Spreadsheet How data were acquired Queried from Open Data portals, systematically joined and cleaned. Compiled based on a comprehensive horizon-scanning of data sources that are processed for a composite indicator Data format Formatted data (Table 1-10 and Tables A1-A10 in Appendix A); Processed and analysed data ( Fig. 1 [20] Global Competitiveness Report The Global Competitiveness Report 2017-2018. World Economic Forum [21] IRENA. Global atlas for renewable energy n.d. https://irena.masdar.ac.ae/gallery/#gallery (accessed January 1, 2017).

Value of the Data
• The data is suitable for constructing a composite indicator for directing/informing decentralised renewable energy investments in Sub-Saharan Africa • The datasets integrate technological, environmental, social, political and financial indicators for decision support • The raw data is made publicly available, and is a unique resource which allows stakeholders to examine the specific situations of countries, and make comparisons in detail • Different weights can be applied to the raw data to enable stakeholders to change the importance they place on certain indicators depending on their own specifications (such as from a NGO, private sector, international institutions or other perspective)

Data Description
This article contains the data compilations for design and development of the PV-Decentralised Energy Investment (PV-DEI) Index for Sub-Sahara African countries. The PV-DEI Index is built in 4 main dimensions (Environmental, Social, Political and Financial), 18 pillars, 43 sub-pillars and 52 indicators. In Fig. 1 the size of the coloured square represents the overall weight of a dimension, and the size of each square represents the weight of an individual indicator. The description of the data sets are provided in the data tables for the main indicators of each dimension in this article, while raw data are provided in table in the Supplementary Information. The original research article [1] describes the analysis and methodology used to create the PV-DEI Index.
Tables SI.1-SI.4 Show the methodology employed to gather the raw data used to compose the PV-DEI Index for the four dimensions: Environmental (Tables SI.1), Social (Tables SI.2), Political (Tables SI.3), and Financial (Tables SI.4).  Table 2 gathers for each country: the market size for decentralised energy options (potential new costumers, total investment costs needs, average levelised cost of electricity and total avoided CO2 emissions. The PV output (kWh/kW p ) represents the theoretical average electricity production per year per kW p installed. Its importance is evidenced by its universal inclusion in modelling papers. PV output directly impacts the amount of energy that can be produced and the levelised cost of electrcity (LCOE). Thus, like proximity to the current-grid it represents a hard limit to the economic competitiveness of decentralised solar-PV.

Positive
Ind.02 PV output-Spatial variability PVGIS [2] , Huld et al. [25] 2019 Spatial Standard Deviation in PV output (kWh/kW p ) The greater the deviation in solar potential throughout the country territory, once the system is optimised by the best location in terms of PV output the more intermittent the reliable power supply becomes along the territory.

Negative
Ind.03 Seasonality Indicator PVGIS [2] , Huld et al. [25] 2019 Standard deviation in PV output across months of the year In the evaluation by Huld et al. [2 , 25] , seasonality was the main determinant of the battery size required for decentralised solar-PV systems. The greater the deviation in solar potential throughout the year the less reliable the power supply becomes year round, this leads to greater reliance on expensive battery storage and/or a larger PV system to deliver the same amount of electricity Wind resource Endowment (TWh per year) for each country, [3 , 22] . Rather than competing with solar, wind resources can be used in conjunction with solar PV to increase the reliability of power supply by utilising two rather than one intermittent supplies Positive ( continued on next page ) Szabo et al. [5] , NASA [6] , JRC-GHSL [26] 2019 In the model used to calculate grid penetration [5] decentralised Solar-PV was unable to compete with grid-connected incumbents in areas proximate to the current grid due to infilling and cost-competitive extension of the existing infrastructure. Thus, this exclusion zone represented a hard limit where renewable technologies were unlikely to be economically competitive. The extent of the current grid also indicates an established reliance on incumbent technologies, which may be challenging to displace for sociocultural reasons. Grid penetration is aggregated at the country level and is the percentage of a country population living close to the existing electricity grid (inside 5 km inclusion zone) or/and zones where there is already light. Calculated using the GIS model of the electricity grid within SSA countries and establishing 5Km buffer zones around where the grid exists and/or where nightlight data indicates that the grid exists (methods section). The calculating the number of people residing inside this exclusion zone and dividing by the total population in the country. This is negatively weighted as the more people inside the existing grid zone the less relevant decentralised technologies are compared to expanding last mile grid coverage, and fewer rural populations need decentralised technologies Negative ( continued on next page )  [5] , NASA [6] , JRC-GHSL [26] 2019 Distance from population centres to the grid. The locations of population centres JRC-GHSL [26] without access to electricity were established using the population out of the grid buffer and without nightlight. Weighted positively: The further population centres are from the grid the more expensive it will be for grid electrification to reach them and the more important decentralised solar-PV options will be. Rural population as a percentage of the total population. The larger the rural population within a country the greater the potential for decentralised solar PV solution to help a significant number of people (even if electrified current rural electricity solutions tend to be expensive, sometimes dangerous and often unreliable) Positive ( continued on next page ) Respiratory disease incidence per 10 0,0 0 0 population. The higher the incidence of respiratory disease the more beneficial decentralised renewable energy solutions may be, both in terms of electrifying health centres that target respiratory disease and in terms of replacing dirty fuels known to cause respiratory disease Positive ( continued on next page ) 2017 Ratio of the percentage of the female labour force population ages 15 and older, which is not in paid employment or self-employed but, is available for work and is actively seeking paid employment or self-employment to the percentage of the male labour force population ages 15 and older in the same status. A higher score reflects greater female emancipation within the labour market, and thus a lower potential impact of electricity provision for improving female liberty. 2018 Corruption defined as the risk that individuals/companies will face bribery or other corrupt practices to carry out business, from securing major contracts to being allowed to import/export a small product or obtain everyday paperwork. This threatens a company's ability to operate in a country, or opens it up to legal or regulatory penalties and reputational damage.

Positive
Ind. 24 Publicised Laws World Justice Project [18] 2019 Publicised laws, data was taken directly from the World Justice Project [18] : Open Governance indicator category, from the sub-indicator titled 'Publicized Laws and Government Data'. This measured: 'Whether basic laws and information on legal rights are publicly available, presented in plain language, and made accessible in all languages. It also measures the quality and accessibility of information published by the government in print or online, and whether administrative regulations, drafts of legislation, and high court decisions are made accessible to the public in a timely manner'.

Positive
( continued on next page )  Ondraczek et al. [27] 2014 In Ondraczek et al. [27] results that in high financing costs were a critical barrier to investment in LEDC's. High financing cost were associated with lower likelihood of investment in solar-PV projects.  [20] . This indicator rewards countries where the share of foreign investment in renewables asset finance is the highest.
Positive * i.e. total number at country level -Country Score -Country Average Value -% of land/population -per capita. * * Most recent reported figure since 2010. Table 2 Market size for decentralised options, total investment costs needs, average levelised cost of electricity (LCOE) and total avoided CO 2 emissions. The market size represents the amount of population living in areas favourable to decentralised energy options (more than 5 km distant of the existing grid and no lighting). The market size is split for the two main options: PV mini-grids (higher density of population) and stand-alone systems (more dispersed population). The total investment costs (NPV) are calculated aggregating the total cost of decentralised energy options taking into account the optimised size of the system for each location and specific load consumption per decentralised system zone (aggregation of cells), the density of population and the economy of scales (lower upfront cost for larger systems). The LCOE is calculated as an average of the LCOE values per country taking only the areas covered by decentralised options. The avoided CO2 emissions are calculated comparing with emissions of diesel generators. The table is sorted by mini-grid market size, with the colours in the left column indicating the overall ranking group in the PV-DEI index (from green most favourable to red least favourable). Fig. 1 displays the breakdown of the PV-DEI index for Congo as an example of the weight of each dimension and indicators. Fig. 2 shows the PV-DEI index variability under three different perspectives private sector, civil society, and international donors: The baseline scenario is determined by the Principle Component Analysis. Fig. 3 depicts the overall investment costs (NPV), are the total amount of investment in PV decentralised option per country.   5 . A displays Correlational assessments carried out in the COIN tool on the non-imputed data sets B ¬ displays Correlational assessments carried out in the COIN tool showing results from one of the 5 MICE imputed data sets C ¬ displays Correlational assessments carried out in the COIN tool on the MissForest imputed data sets Fig. 6 . A ¬ shows PV-DEI scores calculated using the pooled results of the 5 MICE() imputed datasets .B ¬ shows PV-DEI Scores calculated using the MissForest() imputed data Fig. 1 shows the breakdown of the PV-DEI index for Congo as an example of the weight of each dimension and sub-indicators. Fig. 2 illustrates the sensitivity analysis investigating whether the scores and/or their associated inferences are robust with respect to changes in the weighting systems indicative of different stakeholder perspectives [28 , 29] . Fig. 3 depicts the estimated required investment needs for decentralised solar-PV in a country. These represent the total amount of investment in solar-PV decentralised technologies per country (if all the mini-grid investments recommended using the analysis of the PV-DEI Index were undertaken). The overall investment costs are calculated by aggregating the costs of each PV mini-grid at national level [1] . In case of private investments approach, the PV-DEI index allowed to estimate the overall investment cost for each country, showing that for three-top PV-DEI countries the overall investment cost were of approximatively EUR 890 million for Ethiopia, EUR 550 million for Kenya and EUR 525 million investments for South Africa. Table 2 summarises the market size for PV mini-grids which have been calculated for each country accounting for the proportion of population non-electrified versus total population per country and the potential market size for PV decentralised options (potential new costumers)

Experimental Design, Materials and Methods
The PV-DEI index composite indicator was built in accordance with the 'best practice' for composite indicator design outlined by the European Commission's guidance on composite indicators [24] . The structure of the Index was empirically tested, and improved in terms of accuracy and robustness whenever possible [1 , 24] . Fig. 4 illustrates the conceptual and analytical framework. The quality of any composite indicator is determined by the quality of the base data used to populate the index, and the validity of the processes used in the construction of the index. Consequently, data selection was critical in determining the overall quality of the PV-DEI composite indicator. To ensure data sets used to construct the indicator were not selected based on convenience, and thus allowed to modify the structure of the PV-DEI index in a post hoc fashion based on observed data availability, the structure of the PV-DEI index was determined prior to data selection. This was done through an extensive review of the existing literature on the factors important for the direction of decentralised solar-PV investment. Data was then selected in accordance with the a priori specified structure PV-DEI index structure.
The search for relevant data proceeded through online search engine enquiries, in addition to more specific searches using resources provided by the World Bank [8 , 10 , 17] , World Health Organisation [11] , and the United Nations Development Programme [14] . The quality of the indicator data was assessed using a combination of criteria outlined by the OECD and the European Commission in the 'Handbook on Constructing Composite Indicators' [24] . Thus, data-sets were relevant to the overall purpose of the PV-DEI Index, measured within an appropriate timeframe for the phenomenon of interest, appropriately sensitive to slight changes in this phenomenon, interpretable and complete with clear definitions of the items and/or populations studied, coherent across SSA countries, accurate and reliable ( Table 1 ). Overview: The steps completed to ensure data was appropriate for use in the final composite indictor were as follows: 1. The indicator datasets were initially grouped according to the pre-defined conceptual framework. 2. The datasets were intensified to ensure they were comparable across countries. For example, by dividing by a country's population or other indicator-appropriate metric. 3. The indicators were checked for skew and kurtosis. In the COIN tool used for data processing data sets were considered skewed when skew was greater 2 and kurtosis was considered high if it was greater than 3.5 [24] 4. Data sets were winsorized when skew was greater than 2 and kurtosis was greater than 3.5 5. Countries missing more then 65% of data across the indicators were removed 6. Structural assessments (principal component assessments and correlational assessments) were conducted to investigate the underlying structure of the index. 7. Missing data was imputed using the MissForest package in R. 8. Structural assessments were re-run to ensure data-imputation had not significantly altered the underlying structure of the index. 9. Indicator data sets were normalised using the min-max method of normalisation. 10. In the DV-PEI index indicators were aggregated according to the weighting system devised in [1] . Using the raw data provided in this publication it is hoped stakeholders will be able to apply their own weights based on the importance they place on particular indicators.

INITIAL PROCESSING
Once the indicator data had been compiled, data sets were initially intensified following the recommendations of the COIN tool for composite indicator design provided by the European Commission [24] . Data intensification ensured data sets were comparable across countries with diverse population sizes, land areas, and natural resources. Data sets were also winsorized, again following the recommendations of the COIN tool for best practice in composite indicator design. This removed the negative impacts of potentially spurious outliers within data sets. Countries missing more then 65% of data across the indicators were removed from the analysis using the COIN tool.
STRUCTURAL ASSESSMENTS Structural assessments were then undertaken to assess the underlying structure of the PV-DEI index. Correlational assessments were conducted using the COIN tool to ensure no two indicators within the same sub-pillar were highly correlated (high positive correlation: + 0.5), rendering the use of one of them redundant. This was repeated to additionally ensure no indicators were negatively correlated with other indicators in their subpillar (high negative correlation: −0.5), which would have suggested an inconsistency between the indicators and what was being measured. The COIN tool operates through excel and no coding was required for the correlational assessments.
Principal component assessments were also conducted using the R function prcomp(), to ensure that indicator groupings were consistent with the structure of the underlying data. This resulted in the relocation of Indicator 48 which measured the removal of taxes and tariffs from the financial dimension, to the political pillar that focuses on the creation of a decentralised energy market. This remained in keeping with the conceptual framework of the political dimension. After the completion of the structural assessments the PV-DEI index went from 55 to 52 indicators, and indicator 48 was relocated to a different pillar.
Imputation of missing data The imputation of missing data was conducted using two different popular methodologies for data imputation, each requiring a different package in R: 1. Implementation using a random forest algorithm (MissForest) 2. Multiple Imputation via Chained equations (MICE) In both cases datasets had been intensified and winsorized, and countries missing greater than 65% of data across indicators had been removed. Categorical indicators had been removed and missing data for these was imputed separately using the mode of the region of Africa in which the country missing data was located. The MissForest() function in R was used first to generate the imputed datasets. The maximum number of iterations to be performed if the stopping criteria had not been met was set at 10. The number of trees to grow in each forest was set to 300. The final datasets were normalised using the min-max method and used to calculate PV-DEI index scores for comparison with the Mice () output in a sensitivity assessment documented in Fig. 6 .
Following imputation using MissForest(), the MICE() function was used to generate 5 imputed data sets. Thus, the number of multiple imputations was set at 5, the method selected was a predictive mean matching model (PMM), the maximum number of iterations was set at 50. For each of the 5 data sets structural assessments were conducted ( Fig. 5. C). The 5 datasets were normalised independently using the min-max method and used to calculate 5 separate composite indicator scores. The results of these were then finally pooled to create an average PV-DEI index score to enable a sensitivity assessment to be conducted comparing the MICE() and MissForest() methods of imputation ( Fig. 6 ).
Comparison of MICE and MissForest -Re -running the Structural Assessments When comparing the imputed data sets with the original data using the COIN tool, it was apparent that the MissForest method of imputation preserved the original relationships between the indicators to a greater extent than the MICE method of imputation ( Fig. 5 ). Thus, the MissForest package in R was used to impute the missing data. The result of our sensitivity assessment comparing MICE and Miss forest imputation methods was in alignment with findings elsewhere that random forest techniques are more appropriate for imputing data in complex data sets as compared to multiple imputation using chained equations [1] .
Comparison of MICE and MissForest -PV-DEI index scores An additional sensitivity assessment was conducted to investigate how the ranking of countries within the PV-DEI index would alter if data was imputed using Mice() as compared to the MissForest() method. As demonstrated in Fig. 6 the ranking of countries within the PV-DEI was reasonably robust to the imputation method selected. With the exceptions of Cameroon, Somalia and the Democratic Republic of the Congo (DRC) -which all performed markedly better under Mice() imputation, most countries preserved their relative position between the two methods. Following the evidence provided by the structural assessments documented above, and relying on expert knowledge of the relative attractiveness of Cameroon, Somalia and the DRC for investment, PV-DEI scores obtained following the MissForest() method of imputation were used in the final index.
Normalisation Following the imputation of missing data, the completed data sets were normalised using the min-max method of normalisation. This is the technique recommended as best practice within the COIN tool as it is able to preserve the shape of the data distribution for each indicator, and does not unduly rewards or punish exceptional indicator values. To investigate whether using an alternative popular normalisation technique, the Z-score transformation, would have significantly altered results on the PV-DEI index, sensitivity assessments were conducted comparing index results after normalisation using both min-max and Z-score normalisation techniques. The differences in scores were found to be slight as visualised in Fig. SI.6. Aggregation In the DV-PEI index indicators were aggregated according to the weighting system devised in ( [1] ). This was based on both expert knowledge obtained from an expert elicitation survey, and principal component assessments conducted at the level of the sub-pillars, see Fig. SI.7. However, the raw data provided in this publication is hoped to enable stakeholders to apply their own weights and thus create their own PV-DEI indices appropriate for their requirements.
Sensitivity Assessments In addition to the sensitivity assessments documented here, additional sensitivity assessments were conducted to investigate the impact of data winsorization (Fig. SI.3) on PV-DEI scores.