Incorporating high-resolution demand and techno-economic optimization to evaluate micro-grids into the Open Source Spatial Electrification Tool (OnSSET)

a KTH Royal Institute of Technology, Department of Energy Technology, Division of Energy Systems Analysis, 114 28 Stockholm, Sweden b Universidad Mayor de San Simón, Facultad de Ciencias y Tecnología, 2500 Cochabamba, Bolivia c University of Liege, Department of Mechanical and Aerospace Engineering, B-4000, Liege, Belgium d Politecnico di Milano, Department of Energy, 20133 Milan, Italy e Loughborough University, Department of Geography, UK f Imperial College London, Centre for Environmental Policy, UK g KU Leuven, Department of Mechanical Engineering, 2440 Geel, Belgium


Introduction
Access to electricity is a fundamental driver to reduce poverty and enable social, economic and human development (Karekezi et al., 2012). Traditionally, electricity access has often been measured as a binary issue of households being connected to electricity or not (International Energy Agency and the World Bank, 2015). However, to accomplish desired socio-economic development benefits, multiple aspects of energy accessquantity, reliability, affordability, sustainability and safetyshould be accounted when designing electrification solutions for unserved or underserved populations (Bhatia and Angelou, 2015).
Electricity access does not automatically bring desired socio-economic development benefits to newly electrified populations (Odarno et al., 2017). If poorly designed, a supply system can constrain the productive activity of households, community facilities and enterprises. Regardless of quality, if the services derived from electricity access are expensive, consumers will not be able to afford to pay for the service. Hence, energy access initiatives should build upon the understanding between electricity demand and cost of supply to ensure that electricity access and development become mutually reinforcing endeavours (Odarno et al., 2017).
Many efforts are underway across developing countries to enable policy, regulatory and financial conditions to foster energy access. However, government and utility budgets for electrification are often limited and can only afford to provide access to a limited number of people per year. Therefore, electricity access projects have to prioritize between grid extension and off-grid solutions to meet annual electrification targets. To address this challenge, research organizations have started to develop spatially explicit electrification tools to identify priority areas and techno-economic characteristics for grid-extension and off-grid solutions to assist in decision-making and budget allocations. Energy for Sustainable Development 56 (2020)   In recent years, electrification planning models associated with Geographic Information Systems (GIS) and remote sensing data have emerged as independent software packages (Moner-Girona et al., 2018). In principle, these modelling tools address large-scale electrification plans (regional, country or continent level) using techno-economic and GIS data to match energy resources with potential energy demands to provide the least-cost electricity service. The Levelized Cost Of Electricity (LCOE) is often used to select the combination of grid-connected and off-grid solutions that can serve as least-cost options in a territory given a demand potential within a specified time horizon.
To ensure accessibility, transparency and reproducibility of this research article (two important features of high-quality scientific research (Pfenninger et al., 2017)), OnSSET was selected among the other electrification modelling tools for being fully open-source. OnSSET is a software written in Python, developed by the division of Energy Systems Analysis from the Royal Institute of Technology in Sweden (Mentis et al., 2017), with further advances on grid-extension algorithms and settlement clustering developed by Korkovelos et al. (2019). A range of indicators based on open-source geospatial data are used to determine the initial electrification statussuch as proximity to the roads, proximity to existing transmission infrastructure, nightlights and population density. It also determines the additional capacity and investments required to fulfil electrification targets using various geospatial socio-economic and renewable resource data (Korkovelos et al., 2019).
OnSSET and the aforementioned models evaluate various technology options to identify the least-cost electrification solution in largescale applications. While the methods vary in complexity among these tools, one common caveat lies in the technical accuracy of micro-grid systems modelling.
The Re2nAF size mono-source micro-grids (solar with battery or diesel) assuming a simple daily load demand profile and varying the PV array and battery size (or diesel generator capacity) geographically to satisfy a demand proportional to population size (Szabó et al., 2011). Network Planner considers only diesel micro-grids for its analysis and estimates the size using a simple relation of generation capacity and peak demand data in every node (Ohiare, 2015). REM designs optimal configurations of multi-source micro-grids for a number of representative combinations of consumers, and it approximates micro-grid designs for other combinations of customers by interpolation of existing solutions stored in a lookup table (Ciller et al., 2019). Similar to Re2nAF, OnSSET size mono-source micro-grids using a simple energy balance to meet an average peak demand in every settlement (Mentis et al., 2015;Korkovelos et al., 2019).
Adequate micro-grids sizing requires matching unpredictable energy sources with uncertain demands while optimizing for reliability and cost (Mandelli et al., 2016a). Since evaluating every micro-grid candidate (one-by-one optimization) at a large scale is computationally impractical, we develop a modelling framework to bridge the "computational gap" between technically detailed micro-grid systems analyses and largescale electrification modelling.
An innovative three-step framework is proposed with the objective to capture high-resolution peculiarities of electricity demand and to evaluate cost-optimal micro-grid performance at large-scale electrification modelling . To do so, two specialized open-source modelling tools were used to generate surrogate models to automatically evaluate the LCOE in OnSSET. The surrogate models derive from a multivariate regression of micro-grid optimization results as a function of influencing parameters.
The open-source RAMP modelstanding for Remote Areas Multi-energy system load Profileswas used to generate representative load demand profiles using interview-based information for a number of representative settlement archetypes . While the open-source Micro-GridsPy model was used to optimize the size of micro-grid system components . A "Solution Pool" dataset containing optimized solutions for a wide range of possible settlement archetypes and other techno-economic parameters (diesel cost, lost load and capital costs) is used to obtain an adequate size for every micro-grid candidate.
Although the application of the new framework requires additional datasets, the benefit of its implementation lies in the ability to assess complex micro-grid systems at large-scale electrification modelling with increased technical accuracy. Even further, it allows the representation of hybrid micro-grid technologies which will otherwise provide inaccurate results by means of a simplified model.

Case study
The territory of Bolivia covers an area of 1,098,581 km 2 of unique geography with contrasting climatic zones. Its main altitudinal classification divides the territory in the highlands (up to 6500 m.a.s.l) and the lowlands (b800 m.a.s.l). This geographical differentiation was used to characterize the demand of rural populations in Bolivia with distinctive climatic, cultural and socioeconomic characteristics. Climatically, the lowlands of Bolivia are characterized by a monsoon and tropical savannah climate; while the highlands experience large variations, from warm humid subtropical to cold desert climate (Kottek et al., 2006).
Bolivia has currently a population of 11 million inhabitants, from which 67.3% live in urban areas and 32.7% live in rural areas (Instituto Nacional de Estadística, 2018). In less than two decades, the electrification rate in Bolivia increased from 64% in 2000 to 93% in 2018 (MHE, 2014). In the same period, the electrification rate in urban areas increased from 85% to 98% and from 25% to 78% in rural areas. The government of Bolivia has set a goal to reach universal access to electricity by 2025, requiring a national strategy to guide investment needs for gridextension and off-grid solutions. Fig. 1.a and Fig. 1.b illustrate the population size and electrification rate in the near 19,300 communities 1 of Bolivia. The highest concentration of fully electrified communities is closer to the capital cities and close to the high-voltage network, being mostly dense-populated areas. Small populations near and far away from the high-voltage grid have the lowest electrification rates.
The so-called "Isolated Systems" (Sistemas Aislados in Spanish) are decentralized mini-grids. In 2018, the Isolated Systems supplied electricity to near 10% of the total electrified households (211 thousand households) and made the 6.8% of the total installed capacity.
As mini-grids can vary largely in size (from kilowatts to megawatts), a distinction between mini-grids and micro-grids is made in this article (see "Mini-grids and micro-grids differentiation" section). Micro-grids refer in our study to smaller systems than mini-grids with small nonregulated distribution systems. Existing mini-grids in Bolivia have a size in the order of megawatts with regulated distribution networks. The current mini-grid installed capacity is 180 MW, with an energy mix of 66% gas, 25% diesel, 6% hydropower and 4% solar (Autoridad de Electricidad, 2015). In recent years, several mini-grid systems have 1 There are several types of territorial organizations with their own denomination such as communities, municipalities, provinces and departments. A community is an area whose limits are identifiable and authorities are recognized by its inhabitants and by its neighbors. been incorporated to the national grid, reducing their carbon footprint by being dispatched as peak technologies (MHE, 2014). Only a handful number of micro-grids have been implemented in Bolivia, serving to small communities ranging from 125 to 377 households (Balderrama-Subieta et al., 2017).

Population threshold for micro-grid analysis specific to Bolivia
Geo-referenced data from the latest National Census on Population and Households ('Censo Nacional de Población y Vivienda') was available for this study. The Census dataset contains geo-referenced data on number of households, electrification status, electricity source (grid, mini-grid, PV panel, diesel generator) and exact geographic location of 19,280 communities (INE & VMEEA, 2015). Given the availability of this data, the algorithm to determine the initial electrification status from the OnSSET methodology was not used.
A preliminary assessment of the Census database revealed that communities with N550 un-electrified households have a percentage of households connected to the grid. Only 14 communities with N550 un-electrified households do not have any initial connection to the grid. Communities with b50 households have high shares on low-income households. Therefore, these demands are too small (as small as 1 MWh per settlement per year) to justify grid-extension until 2025.
These characteristics have direct implications on the least cost electrification solution. An initial connection to the grid and an accumulated high demand is anticipated to make grid extension the most cost-effective alternative. In contrast, low demand and large distance (larger than 50 km) from the high-voltage grid make standalone systems the most cost-effective. Specifically for communities with high demands and large distances from the grid, either mini-grids or grid extension are the most suitable (an indicative calibration of these relations is given by Fuso-Nerini et al., 2016).
Given the Census dataset contains communities with a wide range of sizesfrom a handful of households to few thousands of householdswe must simplify the range in which micro-grids are assessed. Therefore, our micro-grid systems analysis for Bolivia focuses specifically in communities with a minimum of 50 households (sufficient demand) and a maximum of 550 households (larger sizes have an existing connection to the grid). Fig. 2 illustrates the location of the communities with the selected population threshold. The reader must notice that the methodology for micro-grids described in the following sections is designed to communities in the range between 50 and 550 households and will not be suitable for communities with populations outside of this range.

Methods
This section is divided into five sub-sections. "Demand characterization for Bolivia" section details the data available for Bolivia and assumptions to characterize and to estimate the demand for every community of the National Census database. "Electrification technologies and focus of the study" section introduces to the main concepts of the study, the technologies assessed and the population threshold selected to evaluate micro-grids. "OnSSET original algorithms for grid extension, mini-grid and stand-alone systems" section summarizes OnSSET original algorithms for each technology type. "Micro-grid systems design" section describes the methodological additions of the article to model micro-grids. Finally, a description of the cost scenarios assessed is presented in "Cost scenarios" section.

Demand characterization for Bolivia
To characterize the population living in each community of the Census database, socio-economic information deriving from poverty maps   was used to classify households into two income categories: high-income (HI) and low-income (LI) (GeoBolivia, 2012a). This database provides a georeferenced headcount ratio of the population living under five poverty categories in Bolivia based on the Unsatisfied Basic Needs, UBN, multidimensional method (Feres and Mancero, 2001). According to the UBN classification, two categories correspond to the population living above the poverty threshold and the other three categories correspond to the population living below the poverty threshold. For the sake of simplicity, were re-classified these five categories into two categories. The percentage of the population living above the poverty threshold was categorized as HI and the remaining population as LI.
A distinction of the electricity demand of HI and LI populations living in densely populated communities (N1000 households) and low populated communities (b1000 households) was proposed. For low populated communities, surveyed data from two existing off-grid microgrid systems was used to characterize the demand of populations living in the highlands (800-4000 m.a.s.l) and lowlands (b800 m.a.s.l) of Bolivia. Data from two survey campaigns in the villages of "El Espino" (19.188°S, −63.560°W) and "Toconao" (23.187°S, 68.004°W) were available at the moment of writing this article and used as representative of the lowlands and the highlands in Bolivia respectively. El Espino is a village located in Santa Cruz, Bolivia while Toconao is a village located in Chile, in proximity with Bolivia. A description of both systems is further detailed in Lombardi et al. (2019), Pistolese et al. (2017), and Balderrama-Subieta, Haderspock, Canedo, Renan, and Quoilin (2018). For densely populated communities (N1000 households), information of electricity demand was approximated by downscaling data at national level (Autoridad de Electricidad, 2015).
In addition to household information, electricity demand of community facilities (education and health centres) and public services (public lighting) was also collected. Open-source georeferenced data on the location of existing health and education centres in Bolivia is available in GeoBolivia, 2012b). Proximity methods were used to group georeferenced data on existing health and education centres to the nearest settlement. After compiling this information, the annual energy demand was calculated in each settlement by multiplying the electricity consumption per consumer group and adding the demand from existing community facilities as described in Eq. (1). "Annual demand estimation" section in Annex further describes the demand components and "Geospatial datasets and assumptions" section describes the geo-referenced data used in this study.
where N is the number of households, %HI is the percentage of high-income households, %LI is the percentage of low-income households, D HI , D LI , D health center , D education center and D public lighting are the annual demands of a single high-income household, low-income household, health centre, education centre and public lighting.

Electrification technologies and focus of the study
Portfolio of electrification technologies Fig. 3 illustrates the taxonomy of electrification technologies assessed in this study together with their energy source and associated cost components. The technologies assessed in the OnSSET original version are grid extension, mini-grids and standalone systems. No modifications were applied to the algorithms to model these technologies. In this article, we develop a methodology specific for micro-grids. Note that in the original OnSSET tool, no differentiation is made between mini-grids and micro-grids.
The above-mentioned technology options are economically efficient in different settings. Grid extension is advisable in areas close to existing transmission infrastructure, where electricity demand makes economic sense. Mini and micro-grids are often cost-effective in settlements outside the reach of the grid, with a sufficient density and diversity of users that is more cost-effective to connect together than supplying each user with stand-alone systems. Lastly, standalone systems are the most cost-effective electrification solution for remote and low populated areas offering limited but life-changing electricity service.
The economics of a technology option in a given settlement depend on site-specific characteristics. Such as demand, distance to the grid, renewable potential and added transportation costs to diesel price. This information together with other technology-specific data can be used to determine the LCOE of implementing various electrification options to supply identified electricity needs. The least-cost alternative that provides desired attributes on peak capacity and reliability is advised for investment.

Mini-grids and micro-grids differentiation
A number of technical (Martin-Martínez et al., 2016) and functional (Olivares et al., 2014) definitions to classify mini-grids can be found in the literature. Where mini, micro, nano and pico prefixes are used categorically to specify technical capacities and complexities.
In this study, we differentiate micro-grids from mini-grids based on the size of the population served (see "Population threshold for microgrid analysis specific to Bolivia" section). Micro-grids refer to smaller units than mini-grids, serving mainly residential and other small community services with small non-regulated distribution systems. Minigrids serve to larger populations and productive activities with a regulated distribution system. Both operate in islanded mode and do not consider possible interactions with the grid.
Specific to our case study, we apply our developments to micro-grids and not to mini-grids since surveyed data on demand was available only for small populations at the moment of writing this article. The analysis could be expanded to model mini-grids when data on demand for larger populations is available.
OnSSET original algorithms for grid extension, mini-grid and stand-alone systems This section briefly describes the algorithms used to calculate the LCOE and to size the new generation capacity required for grid-extension, mini-grid and standalone systems in the OnSSET model.
OnSSET algorithms balance demand and supply on an annual basis. Demands in every region are proportional to the population density and urbanization rate. Since hourly differences in load demands are not captured, load profiles between small and large populated areas are not distinguished.
Technologies are sized to meet an average peak demand, but do not include detailed reliability features in the sizing algorithms-which require higher temporal resolution to represent demand and resource variability. The author may refer to Mentis et al. and Korkovelos et al. for a detailed description of the sizing algorithms used in all technologies Korkovelos et al., 2019). Small-scale hydropower potential is estimated following a methodology described in Korkovelos et al. (2018). The LCOE for all technologies is calculated using a mix of techno-economic information of conversion efficiency, capital, fixed and fuel costs (see data assumptions in Table A.7 in Annex).

Grid extension algorithms
The LCOE for grid-extension comprise the cost of electricity generation from the grid-connected power plants and the marginal cost of extending transmission and distribution lines (Mentis et al., 2015). The algorithm examines where it is less costly to extend the grid by medium voltage (MV) lines comparing to deploying off-grid technologies for each un-electrified settlement located within 50 km from the existing and planned high voltage (HV) network (Mentis et al., 2015). This iterative process determines if the connection of one settlement may lead to the cost-effective connection of neighbouring settlements (all within a 50 km limit from HV lines) (Mentis et al., 2015). Extensions by MV lines for distances longer than 50 km may be limited by techno-economical aspects that are not considered in this model (Korkovelos et al., 2019). A comprehensive description of the sizing algorithms for HV and MV transmission lines, transformers, connections to substations and LV distribution lines are described in Appendix D in Reference (Korkovelos et al., 2019).

Standalone systems algorithms
For small domestic consumers, standalone solar PV and diesel generators are often the most cost-effective solution in terms of total investment. These electrification technologies provide a few hours of essential electricity service to power small appliances. However, standalone systems cannot provide electricity with comparable reliability to micro-grid and grid-connected systems. In OnSSET, associated costs for non-served energyloss of load -are not assessed in standalone systems. The LCOE of PV-standalone and diesel-standalone use location-specific data on annual solar irradiation and diesel costs respectively (see "Hourly PV energy generation estimates" section in Annex).

Mini-grid algorithms
OnSSET evaluates demand on a yearly basis for specified household consumption levels, but it does not differentiate demands from small and large populations, which often have substantial differences in load demand profiles.
Different to stand-alone technologies, mini-grids include a distribution network in the settlement. The length of the distribution network is determined with information of the settlement area, electricity demand Fig. 4. Flowchart of the three-step methodology and coupling to OnSSET. For micro-grid LCOE and investment cost calculations in each settlement, low-voltage distribution network costs and household connection costs were added in Step 3 using results from OnSSET mini-grid algorithms. and peak power demand (Korkovelos et al., 2019). Only mono-source technologies are modelled, solar PV-battery and diesel-only mini-grids.
Mini-grids are sized with a simple energy balance to meet an average peak demand using annual data on demand and renewable resources availability. As previously mentioned, since no detailed reliability considerations are included in the sizing algorithms, intraday and intra-seasonal peculiarities are not captured.
For PV-battery mini-grids, OnSSET estimates the generation capacity required (PV panel) but does not explicitly calculate the size of the battery. Investment costs include the battery cost proportional to the generation capacity. Compared to diesel-only, solar-only mini-grids require large batteries to supply electricity with comparable reliability.

Micro-grid systems design
A three-step methodology was designed to estimate the LCOE of PVbattery, diesel and diesel-solar hybrid micro-grids. A flowchart illustrating the sequence of processes and relevant data used is shown in Fig. 4. Each step is detailed in the following sub-sections and it summarizes as follows: ■ Step 1 uses a bottom-up stochastic model to generate demand load profiles for multiple settlement archetypes. ■ Step 2 optimize the micro-grid size for scenarios combining the settlement archetypes defined in Step 1 with other influencing parameters to the LCOE, such as diesel cost, loss of load and capital costs, among others. Each optimized scenario is stored in a "solution pool" dataset. ■ Step 3 estimates a multivariate linear relation of influencing factors to the micro-grid LCOE over the solution pool defined in Step 2. ■ The OnSSET coupling starts at the moment that the LCOE for all technologies is calculated. The surrogate model from Step 3 is used to calculate the micro-grid LCOE for each community. If grid-extension is not economically feasible, then the LCOE of all other off-grid technologies is compared with the LCOE of micro-grids. The least-cost electrification technology is selected (Fig. 4). Additionally, the size of the micro-grid components is approximated to the closest solution from the "solution pool" dataset in Step 2.
Step 1. Hourly load demand profiles The objective of Step 1 is to generate a set of load demand profiles for a wide range of possible combinations of settlement archetypes. This set of demand scenarios aims at approximating real demands from the Census database. The population thresholdfor which micro-grids are assessedconsists of small populations which are not necessarily unelectrified but do not have a connection to the grid in the base year. For this segment of the population, we assess load-demands using interview-based information of available electrical appliances and usage habits from people living in already electrified villages (see "Demand characterization for Bolivia" section) (Mandelli et al., 2016b).
A novel approach developed by Lombardi et al. (2019) generates multi-energy demand load profiles using a stochastic bottom-up model for the situation wherein interview-based information is available. The RAMP modelstanding for Remote Areas Multi-energy system load Profilesis an expanded stochastic approach that builds upon the concept proposed by Mandelli et al. (2016b).
The RAMP model it is specifically used to generate load demands for a set of village archetypes. It uses information on appliance ownership with defined nominal absorbed power, total functioning time along the day and possible periods of use, in addition to other further optional features (e.g. modular duty cycles for selected appliances, cooking cycles and thermal appliances) to generate load profiles. The model was developed using surveyed data from the village of El Espino and validated against measured data from El Espino micro-grid showing a good approximation (average NRMSE of 10%) ). The comprehensive model design and further applications can be found in references .
Assumptions on appliance ownership, electrical cooking and water heating specific to our case study are reported together with usage habits with corresponding timings in "Summary of appliances and using timings in El Espino and Toconao" section in Annex. Stochastic variations between predefined ranges are used to account for uncertainty and random user behavior. Based on this information, the model computes the load demand for an individual settlement.
To tailor energy planning purposes, environmental objectives such as de-carbonization and de-fossilization are of relevance for electrification planning. In this regard, this study includes not only electric appliances but also embeds some fuel-switch to electricity for simple cooking tasks. Electrical cooking demand was added only to 5% of the high-income households by 2025. For larger populated areas (outside the population threshold for micro-grids analysis), electrical cooking was not considered due to ongoing policies for natural gas and LPG fuel intensification in Bolivia.
Building upon the demand characterization described in "Population threshold for micro-grid analysis specific to Bolivia" section, Table 1 describes the demand components with multiple granularities introduced into the RAMP model in Step 1. Fig. 5 illustrates further the set of 330 load scenarios modelled. The granularity of each parameter could be increased or decreased to generate a different set of demand scenarios. While increasing the granularity could benefit to a better approximation to real conditions, it will increase the computation requirements in the optimization carried in Step 2. Table 1 Demand scenarios based on available geospatial data for Bolivia.
A two-stage stochastic MILP optimization model developed by Balderrama et al. is used to determine the micro-grid optimal size and architecture-that minimizes the net present cost-under uncertainty in demand and renewable generation (Balderrama-Subieta et al., 2017;Balderrama-Subieta et al., 2019). The Micro-GridsPy model is based on historical monitoring data relative to an operating microgrid in Bolivia. The 'two-stage' framework stands for determining the optimum value of first-stage variables under the uncertainty of stochastic parameters in the second-stage scenarios. The first stage variables are the rated capacities of each energy source, and the second-stage variables are the operation decisions across the components . This approach allows for the design of a micro-grid system flexible enough to accommodate variations in demand and renewable energy availability without compromising the cost to the final user.
The objective function of the sizing model is to minimize the present cost of the project, as stated in Eqs.
(2)-(4). These equations can be applied to the four mini-grid configurations assessed in our study. Inv where the sub-index y represents every year over the lifetime of the project, s is every scenario, t is every time-step, r is every renewable unit and g is every diesel generator unit. Inv is the total investment cost in USD. C OM , C fuel , C rep , C are the costs in USD/kWh for operation and maintenance, fuel, battery replacement and lost load costs respectively. U PV , U bat , U ge are the capacities in kW of the solar PV, battery and diesel generator, respectively. C PV , C bat , C ge are the capital costs in USD/kW of the PV unit, battery and generator. Finally I s occurrence is the probability of occurrence attributed to each scenario. For a more detailed description of the sizing algorithms, the reader may refer to Balderrama-Subieta et al. (2019). When modelling micro-grids, ensuring energy supply for each hour of the year is not the common practice. In fact, accepting a small fraction of unmet demand -i.e. a fraction that does not significantly compromise the service -has been proven to lead to important cost savings (Balderrama-Subieta et al., 2018). To account for this, the sizing model includes a loss of load probability (LLP) parameter, allowing the system not to supply part of the demand as shown in Eq. (5). The loss of load (E) is considered as another energy source in the energy balance in Eq. (6). Where D is the village demand, E R is the renewable energy, E ge is the energy coming from the diesel generator, E bat, ch is the energy charging into the battery and E bat, dis is the energy discharging from the battery, E is the energy that cannot be met by the system and E curtailment is the energy that the system cannot store or consume. Additionally, the loss of load has an associated cost c in each scenario (see Eq. (2)).
To estimate E PV , hourly PV energy generation is required for any location. "Hourly PV energy generation estimates" section in Annex describes the methodology and data sources used for this purpose. For estimating c fuel , transportation costs of diesel from major cities to each settlement are added to local market prices following the methodology of Szabó et al. (2013).
Step 3. LCOE surrogate model and look-up result dataset Sizing a mini-grid for every settlement is computationally impractical due to the vast number of possible combinations of settlement demands, locations and renewable resource availabilityamong other important techno-economic parameters. Therefore, we propose a mathematical expression that automatically evaluates the LCOE as a function of important influencing factors. A multivariate linear regression is performed over a solution space deriving from the optimization of a combination of parameters. Table 2 details the selected techno-economic parameters with respective ranges that influence the micro-grid optimization results. Combinations of these parameters have been used together with demand scenarios deriving from Step 1 and solar energy output scenarios (see "Hourly PV energy generation estimates" section in Annex) as simulation "instances". A Latin hypercube sampling method is selected to define the input space on which the optimization model is run; therefore, we avoid running a computationally intensive micro-grid optimization model for each instance.
For each instance optimized, the LCOE and micro-grid size are calculated in Step 2. To generate the surrogate model, a multivariate linear regression between the LCOE (dependent variable) and the instance parameter entries (independent variables) is performed. Fig. 6 illustrates the sequence of steps to obtain the surrogate models.
The surrogate model is presented in Eq. (7). Where t specifies the micro-grid technology (solar-only, diesel-only or hybrid), LLP is the loss of load probability in percentage, c fuel is the diesel cost in US$/ liter, H is the number of households, GHI is the global horizontal irradiation. c ge , c R and c bat are the capital costs in USD/kW of the generator, PV unit and battery respectively. c 0 to c 7 are the coefficients of the regression. For each community, i, from the Census database, the LCOE of each micro-grid possibility is calculated using Eq. (7). Note that surrogate models are based on a multivariate regression of datasets resolved with an hourly time resolution; thus, surrogate models inherit the critical information associated with the time component despite not explicitly carrying it.
As for the LCOE, surrogate models cannot be used to approximate the size of each micro-grid component in a given community isince the sizing of the generation and storage units come from an optimization model. Therefore, a look-up algorithm consisting of the following steps was designed. A "solution pool" dataset containing information of the entry information of each scenario modelled (q scenarios in Fig. 6) and its solution output (battery, generator and PV nominal capacities) was setup. For each settlement i from the Census database, the size of the micro-grid components were selected from the most similar scenario available in the Solution pool dataset. Fig. 7 illustrates the procedure followed. Data on demand and techno-economic parameters are stored as feature elements for each settlement from the Census database and the Solution pool dataset. Principal component analysis (PCA) is used to extract information into a lower-dimensional sub-space while preserving most of the variance in the data. From the lower-dimensional subspace (with the principal components containing most of the variance), the Euclidean distance ('distance measure') is calculated to unify these principal components into a one-dimension indicator. This procedure is applied separately to each community from the Census database and the Solution pool dataset. In the solution pool, each "distance measure" indicator is associated with its design parameter results (battery, generator and PV nominal capacities).
To obtain the micro-grid size in each community i from the Census database, the distance measure of the community i is compared to the set of distance measures from the solution pool. The scenario that better approximate the distance measure (nearest key merge) is selected from the Solution pool for the community i. This procedure is applied to all communities from the Census database within the size threshold and without initial connection to the grid (see "Population threshold for micro-grid analysis specific to Bolivia" section).

Cost scenarios
To account for the steady improvements in renewable technology costs and uncertainty on the continuity of fossil fuels subsidies, cost scenarios (fuel and capital costs) were performed to further discuss the results of our analysis. Two existing subsidy schemes in the Bolivian electricity sector were included as scenarios together with international diesel prices. Fig. 8 illustrates the modelled scenarios and Table 3 describes the values used in each of them.

Results
This section presents the results obtained in 3 subsections. "Simulated loads, micro-grid optimization and regression results" section describes the simulation results for the demand load curves,   Table 2. micro-grid optimization and surrogate models. "Electrification results" section describes the electrification results for the modelling period 2012-2025 and compares the results deriving from the enhanced OnSSET version with the original version. "Sensitivity analysis: considering cost-scenarios" section reports an analysis deriving from the cost scenarios.

Simulated demand loads
A total of 330 load profiles were generated, building upon interviewbased information from two representative systems in the highlands and lowlands of Bolivia. The simulated demand loads are shaped mainly by household activitycharacteristic from rural demand profileswith substantial differences resulting from appliance ownership and household activity patterns. Fig. 9 illustrates the aggregated load for selected population sizes and socio-economic mixes. Independently of the size and location, a midday peak demand followed by a steady reduction until 5 pm is observed. From 5 pm, a fast increase in the demand leads to a maximum peak consumption around 9 pm. For the smallest population size, peak at midday derives mainly from schools and from electrical cooking used by a fraction of the population. Fig. A.2 in Annex illustrates the load profiles of each of the demand users separately. . highlands of Bolivia. In the right-hand side stacked loads from residential, hospital, school and cooking for a population of 50 households from the c. lowlands and d. highlands of Bolivia. Note that not all settlements contain health and education centres. Refer to "Population threshold for micro-grid analysis specific to Bolivia" section for further details. average mean ◊ - Fig. 10. Boxplots of the LCOE obtained for solar-only, hybrid and diesel-only micro-grids for all instances described in "Step 3. LCOE surrogate model and look-up result dataset" section. Results are selected for three representative settlement sizes. The importance of the results obtained in this section lies the determination of the intra-day and intra-seasonal demand peaks. This information is of key importance for the design of micro-grids since generation capacity and storage are optimized to meet the peak load under a variety of load conditions during the year.

Micro-grid optimization
The economic competitiveness between solar, diesel or hybrid micro-grids depends largely on location-specific data. Such as demand, solar resource, road accessibility (influencing the diesel cost at the site) and other techno-economic parameters detailed in Table 2. When international diesel prices are applied, hybrid micro-grids are more costcompetitive than diesel micro-grids. However, when large diesel subsidies are applied (such as a subsidized price of 0.18 US$/liter), dieselonly micro-grids are more competitive than hybrid micro-grids.
In all instances, solar-only micro-grids have a higher LCOE than hybrid micro-grids and diesel-only micro-grids. Since solar-only microgrids require a large battery to achieve the same LLP, leading to higher investment costs and higher LCOE. This is illustrated in Fig. 10 with boxplots of the LCOE results obtained in the optimization of all instances modelled. "Micro-grid optimization box-plots" section in Annex expands to compare boxplots for two diesel price scenarios. In addition, "Sensitivity analysis: considering cost-scenarios" section expands with an analysis of the effect of selected techno-economic parameters in the electrification solution.

Multivariate regression results
Coefficients and statistics of the regression are presented in "Multivariate regression results" section in Annex. For the three types of micro-grids, the number of households is the most important influencing parameter to the LCOE. The second more influential parameter for diesel-only and diesel-PV hybrid micro-grids is diesel cost, while for solar-only, it is the cost of the battery.

Reference scenario results applying OnSSET-enhanced algorithms to the case of Bolivia
The least-cost optimization results show that 87.8% of the population could be electrified with grid extension by 2025 compared to 76.3% in 2012. More specifically, grid-extension could provide new electricity connections to 2.9 million people by 2025. Given the relatively dense coverage of the grid, a large number of new connections derive from grid extension (76% of the newly electrified population). The 99.4% of the total electrified settlements by grid extension are located within 10 km from the current grid network. Off-grid technologies supply electricity to disperse populations far away from the grid.
Concerning micro-grids, hybrid and hydropower micro-grids could supply electricity to 1.9% and 0.2% of the newly electrified population respectively and diesel mini-grids to 1.6%. About 6% of the population electrified with micro-grids are within 10 km from high-voltage grid transmission lines and 36% within 10 km from existing off-grid medium voltage lines. Diesel standalone and PV standalone could electrify to 2.1% and 6.1% of the newly electrified population by 2025, respectively.
The investment cost per household varies largely depending on the electrification technology. For households electrified through grid-extension, the investment cost increases with increasing distance to the transmission lines and decreases with increasing population density. The average cost of connecting a household the grid amounts to $1159. All new grid connections in Bolivia will require $973 million.
Investment for mini-grid is estimated at $10 million, micro-grid at $26 million and stand-alone at $258 million. Table 4 summarizes the number of new connections per technology type, investment cost estimates and new capacity when using the OnSSET original and the enhanced framework for micro-grids.  Through 2025, Bolivia will need to increase the grid capacity by 251 MW and the off-grid capacity by 59 MW in order to meet the increased residential demand and electrification targets indicated in our reference scenario. Table A.6 in the Annex describes the expected energy mix of the grid based on committed generation projects towards 2025. Assuming this generation mix together with our results, we estimated that 62% of the additional generating capacity needed to achieve universal access goals in Bolivia would derive from renewable technologies.

Differences in the results when applying OnSSET enhanced and original algorithms
The original OnSSET algorithms do not include hybrid micro-grids or mini-grids, hence we cannot compare the LCOE of this technology between both methodologies. Nonetheless, we can compare results on generation capacity and LCOE for solar-only and diesel-only microgrids for both methodologies. Fig. 11 compares the results when applying the OnSSET original and enhanced algorithms to the lowlands of the population threshold described in section "Population threshold for micro-grid analysis specific to Bolivia". Since the population of the communities in the highlands (within the population threshold) are smaller than the lowlands we present the results separately. "Differences in the results when applying OnSSET-enhanced and original algorithms" section in Annex presents results for the populations in the highlands.
In general, due to the intermittent availability of solar resources, solar-only micro-grids are larger in capacity than the equivalent diesel-only micro-grids supplying to the same demand with same reliability considerations (Fig. 11.a). Since the original algorithms do not optimize the capacity, the size of diesel micro-grids calculated by the enhanced methodology is 2.5 times smaller on average than the original methodology ( Fig. 11.a). Conversely, since the sizing algorithms in the original version do not include detailed reliability considerations, results for the capacity of solar-only micro-grids in the enhanced version are 35% higher on average than the original methodology.
As a consequence of differences in sizing, when comparing the LCOE for diesel-only and solar-only micro-grids, the values calculated by the enhanced methodology are on average 16% lower and 11% higher in average than the original methodology, respectively ( Fig. 11.b).
Regarding hybrid micro-grids, their selection over mono-source micro-grids is a matter of costs as shown in "Reference scenario results applying OnSSET-enhanced algorithms to the case of Bolivia" section. The role of fuel and investment costs is further discussed in "Sensitivity analysis: considering cost-scenarios" section with selected cost scenarios. Table 4 compares the optimal results of the OnSSET original and enhanced methodologies showing important differences across all technologies. A significant reduction on standalone systems is observed in the results of the OnSSET enhanced model compared to the original version. This is explained by the significant oversizing of diesel micro-grids occurring in the results from the original version compared to the enhanced methodologywhich increases its respective LCOE and reduces its cost competitiveness against standalone systems.
Similarly, there is a very marginal reduction on grid-extension connections when comparing the results from the original OnSSET to the new framework. Which is expected since the LCOE for diesel microgrids are smaller in the results of the enhanced model and, therefore, more cost-competitive than grid-extension in certain communities.
Comparing the number of people electrified with micro-grids in Table 4, the solution with the enhanced model has a larger share of micro-grids, accounting for 79 thousand new electrified people compared to 44 thousand people from the original OnSSET code. Table 4 also compares the percentage of new connections per technology type for both methodologies. We observe that for several communities, the least-cost technology switch from grid-extension, diesel micro-grids, solar micro-grids, standalone solar and standalone diesel systems to hybrid micro-grids. Fig. 12 further illustrates the result differences from both methodologies as a percentage of the 72.4 thousand newly electrified population by hybrid micro-grids. Notice that these 72.4 thousand settlements were electrified with a mix of other technologies in the OnSSET original version. Fig. 13 illustrates further with the geospatial least-cost electrification solution for the original and enhanced OnSSET formulation for every community of the Census database. Highlighted in the map with circle symbols are the results for micro-grids. Results from the enhanced model have 208 communities electrified with micro-grids compared to 101 communities in results from the original model. Note that the presence of a higher number of settlements electrified with hybrid micro-grids in the enhanced methodology is because of differences in LCOE and does not relate to their spatial distribution.

Sensitivity analysis: considering cost-scenarios
To highlight the results of the cost scenarios (see scenario details in "Cost scenarios" section), we report three indicators for the  communities that belong to the population threshold considered for micro-grids (see "Population threshold for micro-grid analysis specific to Bolivia" section): 1. Number of new connections, 2. New capacity added and 3. Investment costs per type of electrification technology.
The settlements selected for this analysis have a population between 50 and 550 households with no connection to the grid in the base year. The reason of this selection is to exclude from the cost-scenario analysis a. b. Fig. 13. Geographical comparison of the least-cost electrification technology solution for OnSSET original and OnSSET enhanced for micro-grids. Micro-grids are plotted with circle-shaped symbols for diesel (red), hydro (cyan), PV (orange) and hybrid (yellow). Grid extension with cross (light blue), standalone diesel with cross (light red) and standalone PV with cross (light green) symbols. Notable differences in the number of micro-grids are observed when comparing the results from the OnSSET original and OnSSET enhanced methodology. a. Electrification technologies when applying OnSSET enhanced algorithms for micro-grids. b. Electrification technologies when applying OnSSET original algorithms. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) to all settlements with little demand for which standalone are the only solution (either PV or diesel) and settlements for which grid extension is the only solution.
Fig. 14 compares the results for new connections (Fig. 14.a), new capacity ( Fig. 14.b) and investments (Fig. 14.c) per technology type. Between the scenarios with low and high capital cost (SC1 and SC2 respectively), the smallest settlements switch from hybrid micro-grids to standalone when capital costs increase, the opposite happens when there is a reduction in capital costs. This marginal difference, however, does not account for reliability differences between standalone and micro-grid systems.
For the first diesel price subsidy scenario (SC3), there is a slight increase in the cost-competitiveness of hybrid micro-grids and a significant increase for diesel standalone compared to the reference scenario. For the highest subsidy scenario (SC4), diesel-fuelled technologies gain cost-competitiveness against other renewable technologies. This observed switch from micro-grid to standalone systems is explained by marginal differences on the LCOE and do not consider differences on reliability, as mentioned previously. Although standalone systems are more expensive per unit of power capacity, these do not require expensive distribution lines and connection costs per household compared to micro-grids (see Fig. 14.c). Further considerations on reliability should be introduced to standalone systems in the OnSSET algorithms for an effective cost-comparison with micro-gridssince these provide only limited hours of electricity.

Discussion
In this article, we developed an innovative methodology that bridges the "computational gap" between technically-detailed micro-grid systems analyses and large-scale electrification modelling. This is achieved by surrogate models deriving from multivariate regressions of microgrid optimization results of multiple scenario instances. Each scenario instance consists of a combination of settlement size and other important techno-economic parameters influencing the LCOE.
Surrogate models were directly coupled to a mature electrification tool, OnSSET, to calculate the LCOE of micro-grids in an automated fashion. Unlike sizing algorithms to model mono-source micro-grids, hybrid micro-grids require a specialized model to optimize generation and storage capacities to determine the least cost configuration. Therefore, representing hybrid micro-grids by means of a simplified model would provide inaccurate results with low technical accuracy.
Further expansions of this methodology could include a broader portfolio of micro-grid technologies and fuels (Step 2), such as winddiesel, wind-gas, solar-gas or other multi-source micro-grid possibilities. Similarly, the methodology could be expanded to assess larger systems such as mini-grids, which require additional surveyed information on appliance ownership and usage habits, among others.
Deriving from the cost sensitivity analysis, we found that diesel cost is the main driving factor for the choice between standalone and microgrid systems. In the original OnSSET model, units of standalone systems can be aggregated in a modular fashion to supply demand. Since this simplified approach does not account for reliability, an effective costcomparison with micro-grids cannot be made. One way to improve the representation of standalone systems without increasing complexity in OnSSET, could be to add a cost for loss of load in the cost function. Although we recognize the complexity of setting values for loss of load in rural areas, it is necessary to differentiate desired attributes on peak capacity and reliability among technologies.
With regards to off-grid interconnection possibilities, OnSSET does not connect mini-grids or micro-grids that are close to each other. Further studies are required to look at how existing mini-grids or microgrids could be inter-connected to lower the costs and ensure reliability (should newly electrified communities have their own generation units? Or, should the generation capacity be reinforced in the neighbouring location?).
With regards to the case study, one important caveat in our demand assessment is the exception of productive uses of electricitysuch as electricity for agriculture and manufacture. Since productive uses of electricity do not always occur immediately after electrification occurs, potential demands for productive uses require careful assessment. Hence, further data and analysis are required to determine such electricity needs. Notwithstanding such limitations, the model generating load demands (Step 1) can be customized and include hypothetical productive demands when information is available.
Finally, it is worth mentioning that capacity expansion models and GIS-based electrification modelling could be coupled to provide a complete view of the investments required to reach universal access to electricity. Since investments decisions at grid-level affect the gridelectricity price directly and therefore, the LCOE for grid expansion; decisions at grid-level could possibly affect the selection over off-grid technologies. Moksnes et al. proposed a soft-link between an opensource capacity expansion planning tool, OSeMOSYS (see Howells et al. (2011)), and OnSSET to evaluate investments to reach universal access to electricity in Kenya (Moksnes et al., 2017). In this work, it is demonstrated that different technology configurationsgrid and off-gridare obtained when planning for either high demand or low demand scenarios. When demand is expected to be high, grid connections increase compared to off-grid solutions (Moksnes et al., 2017).

Policy considerations
Results from our cost-scenario analysis reveal how sensitive the electrification results to diesel prices are. Clearly, the continuation of diesel subsidies strongly reduces the economic competitiveness of local renewable energy resources.
Fossil fuel subsidies are remarkably widespread in developing countries for several socioeconomic factors (Szabó et al., 2011). For a small economy, a meaningful change in subsidy schemes would consequently produce large macroeconomic impactsthrough economy-wide changesin sectoral relative prices and demands (Coxhead and Grainger, 2018). Therefore, it may be counterintuitive to remove fossil fuels as a measure to foster energy access. Yet, in the same way fossil fuels subsidies are used to promote affordable energy, renewable energy subsidies could be considered to compensate for this market distortion (Szabó et al., 2013).  As electrification planning diversifies with the inclusion of decentralized alternatives, different affordability and financing concerns emerge. Further enabler actors should be considered by electricity planners and policymakers to address the entire range of affordability concerns for both grid and off-grid rural consumers. More important, better coordination among national stakeholders is needed to develop a local renewable energy industry able to mobilize public finance towards rural electrification projects.
In the future, there will be important considerations around technology costs and security. While battery costs are dropping, we find that fossil fuel provides an inexpensive alternative at present. In the future, we must ask if battery prices will continue to drop as global demand for them (and their constituent materials) continues to rocket.

Conclusions
This research article represents a step forward in the formulation of geospatial electrification modelling tools. For which, an innovative methodology was developed to maintain the technical accuracy of detailed load simulation and micro-grid optimization analyses in a largescale geospatial electrification tool. The methods presented in this article offer an innovative solution to identify priority areas for micro-grids at early stages of rural electrification planning.

Annual demand estimation
For each scenario modelled using RAMP, 365 daily profiles are computed, each with 1-minute time-step (later resampled to 1-hour resolution to couple with the micro-grid sizing model). Four seasons were distinguished, summer, autumn, winter and spring with respectively 90, 91, 92 and 92 profiles each. Specifically for indoor and public lighting, different time-frames for indoor and public lighting were used in each season as a result of variations on sunrise and sunset timings.

Hourly PV energy generation estimates
Over the past few years, the use of global meteorological data into reanalyses has emerged as an important source of synthetic information to estimate renewable energy availability . A key benefit of reanalyses is the provision of data for remote locations, where field measurements are usually not available. Data from an open web-based application deriving from the work from  was used to estimate hourly PV energy production.
Hourly data on the total incident radiation on the PV tilde surface (I T, β ) and the temperature on the PV cell (T PV ) are required to estimate hourly PV energy generation (E PV ). The incident radiation, (I T, β ) is obtained from [1] and [2]. The temperature on the PV cell (T PV ), was estimated using Eq. (A.1), where NOCT is the nominal operating cell temperature, t is the time-period and T amb is the ambient temperature.
The hourly PV energy generation in each arrayðE PV m ) is then calculated following the methodology of (Holmgren, Hansen, & Mikofski, 2018) combined to a commercial PV database from (Gosolarcalifornia, 2019). The energy output in the entire array is calculated with Eq. (A.2), where N PV is the number of array units and η inv is the efficiency of the inverter.
To represent statistically all communities, coordinates of GHI quartiles at 25%, 50% and 75% were selected to extract GHI time series (Table A.2).

Geospatial datasets and assumptions
For the centralized grid generation, the average investment cost was assumed as 1655 $/kW based on committed investments on power generation in the national grid. The expected energy mix by 2025 is detailed in Table A.6 (Peña Balderrama, Alfstad, Taliotis, Hesamzadeh, & Howells, 2018. The grid generating cost of electricity represents the cost of producing 1 kWh of electricity and does not reflect the customer tariff. It was assumed as 0.1223 $/kWh. Related costs to the grid extension are detailed in Table A.5 and techno-economic parameters for the off-grid electrification technologies are detailed in Table A.7. Finally, all cost-related comparisons are performed in present value. The discount ratio was set at 12% in line with the Ministerial Resolution 01/200 of the government of Bolivia, specific for investments on electrification infrastructure. Wind speed annual average Provide information about the wind velocity (m/s) over an area. It is used to identify the availability/suitability of wind power (through capacity factors).
Raster (ESMAP et al., 2018) Travel time Accounts spatially the travel time required to reach from any individual cell to the closest town with population N50,000 people. It is used to estimate diesel transportation costs.
Raster (Weiss et al., 2018) Digital elevation map It is used in the energy potential estimation, restriction zones and grid extension suitability map Raster (Farr et al., 2007) Land cover Land cover maps are used in a number of processes in the analysis (Energy potentials, restriction zones, grid extension suitability map etc.).

Raster (GeoBolivia, 2011)
Poverty Provides socio-economic information of the population using the Unsatisfied Basic Needs, UBN, multidimensional method in which a headcount ratio of the population living under five poverty categories is provided. It is used to disaggregate the population in 4 consumption levels.
Polygon vector (GeoBolivia, 2012a) Health centers Locations of health centers as vector containing relevant attributes to estimate potential electricity demand (health post, health center without international Point vector (GeoBolivia, 2012b)  Techno-economic input parameters   Summary of appliances and using timings in El Espino and Toconao   * Fridge follow a specific ad-hoc cycle, is not functioning full power for 24h. ** Total duration and functioning windows depending on seasonal sunrise and sunset times. *** Depending on population size. Table A.10 details the regression results and respective statistics for each surrogate model (see structure in Eq. (A.3)). When looking at the regression diagnosis-testing results, we can conclude that the accuracy of the regression fit is sufficient, with low scores reported for the Mean Absolute Error (MAE) and low scores for the Root Mean Squared Error (RMSE). Specifically for solar-only micro-grids, the models were not able to predict the LCOE as accurately as for the other two models, due to the non-linear characteristics of battery sizing. Looking at the R-squared scores, we find a regression correlation higher than 95% for all models, with the only exemption of solar-only micro-grids with correlation of 92%. Looking at the Durbin Watson (DW) residual autocorrelation test score, we found no autocorrelation in any of the models (DW = 2 means no autocorrelation). Each coefficient of the regression is statistically significant, with non-significant standard errors and non multicollinearity found with the Variance Inflation Factor (VIF) test (since a Latin hypercube sampling method is applied to all instances prior optimization).

Multivariate regression results
Micro-grid optimization box-plots