Datasets for a multidimensional analysis connecting clean energy access and social development in sub-Saharan Africa

In this article we present datasets used for the construction of a composite indicator, the Social Clean Energy Access (Social CEA) Index, presented in detail in [1]. This article consists of comprehensive social development data related to electricity access, collected from several sources, and processed according to the methodology described in [1]. The new composite index includs 24 indicators capturing the status of the social dimensions related to electricity access for 35 SSA countries. The development of the Social CEA Index was supported by an extensive review of the literature about electricity access and social development which led to the selection of its indicators. The structure was evaluated for its soundness using correlational assessments and principal component analyses. The raw data provided allow stakeholders to focus on specific country indicators and to observe how scores on these indicators contributed to a country overall rank. The Social CEA Index also allows to understand the number of best performing countries (out of a total of 35) for each indicator. This allows different stakeholders to identify which the weakest dimensions are of social development and thus help in addressing priorities for action for funding towards specific electrification projects. The data can be used to assign weights according to stakeholders’ specific requirements. Finally, the dataset can be used for the case of Ghana to monitor the Social CEA Index progress over time through a dimension's breakdown approach.


a b s t r a c t
In this article we present datasets used for the construction of a composite indicator, the Social Clean Energy Access (Social CEA) Index, presented in detail in [1] . This article consists of comprehensive social development data related to electricity access, collected from several sources, and processed according to the methodology described in [1] . The new composite index includs 24 indicators capturing the status of the social dimensions related to electricity access for 35 SSA countries. The development of the Social CEA Index was supported by an extensive review of the literature about electricity access and social development which led to the selection of its indicators. The structure was evaluated for its soundness using correlational assessments and principal component analyses. The raw data provided allow stakeholders to focus on specific country indicators and to observe how scores on these indicators contributed to a country overall rank. The Social CEA Index also allows to understand the number of best performing countries (out of a total of 35) for each indicator. This allows different stakeholders to iden-tify which the weakest dimensions are of social development and thus help in addressing priorities for action for funding towards specific electrification projects. The data can be used to assign weights according to stakeholders' specific requirements. Finally, the dataset can be used for the case of Ghana to monitor the Social CEA Index progress over time through a dimension's breakdown approach.
© 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ) Table   Subject Energy Specific subject area Renewable Energy, Sustainability and the Environment Type of data Table  Figure How the data were acquired Queried from Open Data portals, systematically joined and cleaned. Compiled based on a comprehensive horizon-scanning of data sources that are processed for a composite indicator. Data format Raw: Formatted data Processed and analysed data Description of data collection Raw data was collected by systematic queries. Formatted raw data compilations are utilized for data processing and analyses in the context of the research work. Data source location Secondary data in supplementary material.

Value of the Data
• The data is suitable for constructing a composite index encapsulating multiple indicators related to education, health and wealth that are vital in shaping national and international policies supporting electricity access. • The raw data is made publicly available and represent a unique resource which allows different stakeholders to identify most appropriate social ecosystem for decentralized electricity access financing or/and funding. • The data can support stakeholders in monitoring the effects of electricity access programmes on social development by tracking trends over time. • Stakeholder can tailor the weights assigned to the different dimensions to match their specific requirements. For example, a philanthropic organization may use the Social CEA Index to find regions where funding in electricity generation may have the greatest health benefits. Policy organisation can give weight to factors of specifict policies (poverty alleviation, children well-being, women empowerment etc) to see the specific policy's energy implications.

Objective
In this article we describe the datasets used for the construction of the Social Clean Energy Access (Social CEA) Index, discussed in detail in P. Casati et al. (2023). The Social CEA is a novel composite index including 24 indicators capturing the status of specific social dimensions related to electricity access for 35 SSA countries. It was created to identify the most suitable countries for funding and implementing decentralised renewable energy systems, shedding light on the opportunities for improving social conditions through clean electrification. In addition, the dataset has been extended to monitor the Social CEA Index trend over time through a dimension's breakdown approach in Ghana. The development of the Social CEA Index was supported by an extensive review of the literature focusing on the relationship between social outcomes and electricity access. A low score of the final indicator implies that financing clean electrification programs is likely to improve specific social outcomes in the identified country. The dataset would allow policy makers, non-for-profit organizations, researchers, and entrepreneurs to potentially re-use the data for tailoring the Index or analyse individual indicators trends, streamlining target countries electrification policies.

Data Description
This article contains the dataset used for the design and development of the Social CEA Index for Sub-Saharan African countries. The Social CEA Index is built in 5 main dimensions (Healthcare, Education, Gender equality, Quality of life and Economic development), 12 sub-dimensions and 24 indicators. Thus, this dataset focuses on the salient social dimensions related to clean electricity access, further emphasizing the increasingly evident interconnections between energy and the society. The imperative to increase access to clean and affordable energy services is key in promoting poverty alleviation in SSA, thereby contributing to improve social outcomes (e.g. healthcare, education, gender equality), quality of life conditions and economic development.
The description of the datasets is presented in this article, while raw data are provided in the Supplementary Information. The original research article [1] describes the methodology used to create the Social CEA Index, providing evidence about the status of social factors related to electricity access in SSA. Table 1 summarizes the classification, source, year and description of the 24 indicators composing the Social CEA Index. Table 2 illustrates the correlation between the "Electricity access" indicator and the remaining 23 indicators. Table 3 illustrates the Social CEA Index variability under three different stakeholder's perspectives.
Table SI.1 shows the description of each indicator and the weights used. Indicators were aggregated according to a weighting system established through a public consultation involving different stakeholders (private sector, public sector and civil society) [17] and the support of internal experts.
Tables SI.2-SI. 6 show the methodology used to collect the raw data for the composition of the five dimensions of the Social CEA Index: Healthcare (Table SI.2), Education (Table SI.3), Gender equality (Table SI. Fig. 1 shows the structure of the Social CEA Index. Fig. 2 shows the Social CEA Index scores. Fig. 3 depicts the Principal Components Analysis (PCA) investigating the underlying structure of the index data, in particular that all indicators contributed to one key measure of social development. Fig. 4 A) displays correlational assessments carried out in the COIN tool on the non-imputed data sets; Fig. 4 B) displays correlational assessments carried out in the COIN tool on the Miss-Forest imputed data sets. Fig. 5 displays the Social CEA Index scores for Ghana over the selected time frame. Fig. 6 represents the breakdown of the Social CEA for Ghana using the attributed weights.  2019 Electricity expenditure per day, in US$. Higher levels of electricity expenditure together with high affordability may indicate that the country has a greater proportion of people with affordable electricity access and thus the social impact of further electrification in those areas will be comparatively more limited.
Positive Table 2 Correlation between ind.17 "Electricity access" and the remaining 23 Social CEA indicators. Fig. 1. Structure of the social CEA index. Source: Authors' own elaboration.

Table 3
Social CEA Index variability under three different stakeholders perspectives (private sector, international donors and civil society) and according to an equal weights approach.

Fig. 2.
Final Social CEA Index scores. In order to obtain the final Index score data intensification, outlier treatment, missing data imputation, data normalization and indicators weighting and aggregation were carefully carried out.

Experimental Design, Materials and Methods
The Social CEA Index was built in accordance with the "best practice" for composite indicator design outlined by the European Commission's guidance on composite indicators [18] . Its structure was empirically tested and, if possible, improved in terms of accuracy and robustness Fig. 1 [1] illustrates the structure of the Social CEA Index.
The following steps have been completed to ensure raw data were appropriate for use in the final Social CEA Index: 1. The structure of the Social CEA composite indicator was determined prior to data selection. This was done through an extensive review of the existing literature on the social impact of electrification in the context of SSA.  [15] , World Bank (2010) [16] and then grouped according to the identified framework. 3. The datasets were intensified to ensure their comparability across countries. For example, by dividing the indicator by country's population or other metrics. 4. Data processing was then carried according to [18] . To treat outliers, datasets were then winsorized when skew was greater than 2 and kurtosis was greater than 3.5. 5. Countries and indicators with a coverage lower that 63% were removed and then correlational assessments conducted to investigate the underlying structure of the index. 6. Missing data were imputed (i.e. replaced with some substitute value to retain most of the 1information of the dataset) using the MissForest package in the software R and structural assessments were re-run to ensure data-imputation had not significantly altered the underlying structure of the index. 7. In order to bring indicators onto a common scale, rendering them comparable, the dataset was normalised using the min-max method of normalisation.
8. Principal component analysis (PCA) was carried in order to show that all indicators contributed to one key measure of social development. 9. Finally, indicators were aggregated according to the weighting system established through both the results of a public consultation [17] and the support of internal experts. Fig. 2 illustrates the final Social CEA Index scores.

Data selection
Data selection was critical in determining the overall quality of the Social CEA Index. Therefore, to ensure that the datasets used to construct the index were not selected based on convenience, literature review and expert consultations contributed to the development the hierarchical structure of the index prior to data collection. Indicators were chosen from reliable sources and where possible these were collected from International Organisations working under statistical regulations or codes of conduct. The quality of the indicator raw data was assessed using a combination of criteria outlined by the OECD and the European Commission in the "Handbook on Constructing Composite Indicators" [19] . Each of the main dimensions of the indicator was carefully constructed to align with the overall Social CEA composite indicator.

Initial processing
Once the indicator raw data had been compiled, we ensured that indicators were comparable across SSA countries that are characterized with diverse population sizes, land areas, and natural resources. This implied the intensification of appropriate indicators. Data sets were also winsorized, again following the recommendations of the COIN tool for best practice in composite indicator design. This removed the negative impacts of potentially spurious outliers within data sets. Countries missing more then 63% of data across the indicators were removed from the analysis using the COIN tool.

Structural and correlational assessments
To identify the underlying structure of the social composite indicator, both correlational and principal component assessments (PCA) were conducted. Initial correlational investigations were conducted using the COIN tool [18] . These correlational assessments were undertaken to ensure that indicators within the same sub-dimension were not highly correlated (high positive correlation: + 0.5), rendering the use of one of them redundant. This was repeated to additionally ensure no indicators were negatively correlated with other indicators in their sub-dimension (high negative correlation: −0.5), which would have suggested an inconsistency between the indicators and what was being measured. Indicators that were either positively or negatively correlated with their neighbors were investigated to determine whether there was a theoretical grounding for this. In the Social CEA Composite Indicator negative correlations were retained only within the gender equality dimension, albeit none of these exceeded −0.5. Furthermore, after the structural assessments, four indicators pertaining to the quality-of-life dimension were categorized in a new dimension, i.e. economic development, addressing in this way the issue of negative correlations.
Particular attention was devoted to the evaluation of the correlation between the ind.17 "Electricity access" and the remaining 23 indicators ( Table 2 ). Correlations have been identified again using the COIN tool [18] but in this case + 0.3 represented the threshold for high positive correlation and -0.3 for high negative correlation. This analysis was essential in order to further evaluate synergies between electricity access and social development.
Finally, PCAs ( Fig. 3 ) were conducted using the software R in addition to the correlational assessments, carried out using the COIN tool, to visualize and better understand the underlying structure of the social composite indicator. In particular, the PCA was undertaken to show that all indicators contributed to one key measure of social development, in addition to the qualita-tive stakeholder suggestions and literature review. This resulted in a refined composite indicator that was valid both qualitatively and quantitatively.

Imputation of missing data
Then challenge of missing data was also addressed. For imputing missing values two different methods can be adopted: I. Multiple Imputation via Chained Equations, i.e. MICE) II. Implementation of a random forest algorithm, i.e. MissForest) Considering the results obtained from [14] and [20] we decided to implement a random forest algorithm (MissForest). In fact, MissForest made fewer assumptions about the shape of each dataset and did not require a specific regression model to be specified for imputation.
Then, structural assessments were re-run to ensure that data imputation had not significantly altered the underlying structure of the index. Fig. 4 A and B show the correlational assessments carried out in the COIN tool on the non-imputed data and on the MissForest imputed datasets.

Normalization
The completed data sets were normalised to ensure comparability between indicators originally existing at different scales and ranges, and measured in disparate units. Considering the results provided by [13] and [14] , we selected the rescaling or min-max method of normalisation because this preserved the shape of the data distribution for each indicator and did not disproportionately reward or punish exceptional indicator values in contrast to methodologies using Z-scores.

Aggregation and sensitivity assessments
Indicators were aggregated according to the weighting system developed in [1] . We did not opt for an equal weights approach, due to the presence of some social indicators having greater importance in directing financing in decentralised renewable energy systems. Thus, the adopted weighting system was based on the results of a public consultation carried out through a survey [17] and the support of internal experts. Then weights were multiplied by the country's score for each indicator, and then scores across all the 24 weighted indicators were summed together to produce a country's final index score ( Fig. 2 ). A sensitivity analysis was carried out to check whether the scores (and the associated inferences) were robust with changes in stakeholder perspectives ( Table 3 ) [1] .

Social CEA Index in Ghana
Finally, a dataset attempting to analyse the Social CEA Index trend was also developed. This was done in order to assess the Social CEA Index trends in a chosen time frame, also according to a dimension breakdown. The lack of complete time series data for several individual indicators limited the possibility of observing the evolution of the Social CEA Index for all countries. Therefore, only the case of Ghana was analysed. Following the methodology adopted for the construction of the Social CEA Index, data have been normalized through the min-max method and the lowest values have been assigned to zero; this explains the low starting scores in 2003 Due to data availability issues, the Index include only 15 out of 24 indicators ( Fig. 5 ). Fig. 6 illustrates the breakdown of the Social CEA in Ghana. The size of the coloured squares represents the overall weights of the dimension (Healthcare, Education, Gender equality, Quality of life and Economic development) and the size of each square the weights of the individual indicator.