Formulation and evaluation of long-term allocation problem for renewable distributed generations

: The penetration rate of renewable energy into the power systems has been increasing in many countries. The governments and international energy organisations have announced the long-term visions such as power generation and penetration rate of renewable energy. However, the specific installation plans and scenarios have not been discussed and determined. Although plenty of researches related to optimal allocation problem of renewable energy-based distributed generations (DGs) have been proposed in past decade, most of the studies have considered only 1 year's allocation and daily annual system operation. Therefore, this study proposes a novel scenario-based two-stage stochastic programming problem for long-term allocation of DGs. Also, a scenario generation procedure is presented for solving the problem. Furthermore, few studies have focused on the evaluations and analyses of the optimal solutions in various aspects. Thus, this study shows the optimal allocation results on 34-bus distribution network, considering the different scenario generation methods and number of scenarios. The authors finally discuss the important points and indicate that decision makers have to consider several important issues. In addition, they developed and released a general-purpose framework for optimal allocation of DGs as an open source.


Background
Distribution companies (DISCOs) need to deal with the intermittent nature of renewable energy sources (RESs) such as wind speed and solar radiation if RESs based distributed generation (DGs) are installed actively in distribution network, in order to maintain the demand-and-supply balance continuously, and accommodate demand growth over the planning horizon [1]. DGs refer to small-scale energy generators, which are generally used to guarantee that sufficient energy is available to meet peak load. DG planning (DGP), which determines the optimal siting, sizing, and timing, is modelled to tackle above problem. One of the objectives of DGP is to ensure a reliable power supply to the consumers at a lowest possible cost. DGP plays an important role as a strategiclevel planning in modern power system planning. Recently, stochastic programming and metaheuristic-based approaches have been used to solve the DGP, which can deal with the various uncertainties at the energy planning [2,3].

Related work
Much attention has been paid to solving real stochastic problems for both single and multi-resource capacity planning. Scenariobased techniques have been proposed to consider various uncertainties [4][5][6].
In power system planning on transmission and distribution network, many approaches have been developed in consideration of RESs, energy conversion, and transmission, and the uncertainties that are caused by demand, pricing, and intermittent renewables [7]. In [8], an energy planning in individual large energy consumers was formulated as a mixed-integer linear programming (MILP) model by using fuzzy parameters. Atwa et al. [9] provided a probabilistic mixed-integer non-linear problem for distribution system planning.
Several studies have proposed the stochastic optimisation of DGP [10,11]. In [12], the uncertainties of wind energy, photovoltaic (PV), and energy storage system (ESS) were produced as chronological ones for solving a two-layer simulation-based allocation problem. Pereira et al. [13] formulated an allocation problem of Var compensator and DG as a mixed-integer non-linear problem and solved by using metaheuristic algorithms.
A two-stage architecture is commonly used for the formulation of stochastic programming problem. At the first stage, DGs' investment variables are determined before realisations of random variables are known, i.e. scenarios. At the second stage, after scenarios become known, operation and maintenance variables that depend on scenarios are solved. Carvalho et al. [14] modelled a two-stage scheme problem of distribution expansion planning under uncertainty in order to minimise an expected cost along the horizon. This model was solved by the proposed hedging algorithm in an evolutionary approach to deal with scenario representation efficiently. In [15], a two-stage stochastic programming for capacity expansion planning was provided in a power system of Japan. This includes the uncertainties of the demand, carbon tax rate, and operational availability. In [16], a two-stage robust optimisation-based model considering uncertainties of DG outputs and demand was provided for the optimal allocation of DGs and micro-turbine. Montoya-Bueno et al. [17] proposed a two-stage stochastic multi-period mixed-integer linear programming model of renewable DG allocation problem considering the uncertainties affected by demand and renewable energy production.
In solving the two-stage stochastic programming, an effective methodology to create proper scenarios must be needed to represent various uncertainties, because it is very difficult to realistically obtain all of the information about the uncertainty and computationally incorporate it into the model. In case some probability distributions are analytically estimated and used instead, the problem commonly becomes very complexed, even if the problem is small. Hence, when the partial information of the uncertainty is available, the stochastic programming model normally needs to be solved using scenarios. There exist many techniques of scenario generation [18][19][20][21][22][23][24][25].
Most of scenario generations have not considered the correlation between the uncertainties (e.g. demand and solar radiation) and there exist some manual setting parameters [19,25]. It is necessary, however, to create appropriate scenarios automatically in consideration of the correlations based on historical data. In optimisation problem mentioned above, many researches of optimal DG allocation problem that takes into account the uncertainties have been performed. However, most of the studies have considered only 1 year's allocation and daily annual system operation. Realistically, in order to accomplish the optimal system operation in multi-period, the long-term optimal siting, sizing, and timing should be obtained. Additionally, few studies have focused on the evaluations and analyses in terms of the scenario generation and the number of scenarios. It is shown in Table 1 for understanding the originality of this paper.

Contribution
This paper provides the three main contributions as follows: • A new scenario generation method with K-means is proposed to create scenarios automatically. This procedure uses historical data and can be implemented readily. If K-means algorithm is simply applied to the available data, it is not possible to take into account the seasonal characteristics and the correlations between demand and meteorological data. Therefore, in the proposed approach K-means clustering is utilised in stages by focusing on seasons and demand. Many scenarios of demand, wind speed, and solar radiation are generated and appropriate probabilities of each scenario are calculated (not equal probability) by using divided clusters. • A new long-term allocation problem of RES-based DGs is proposed. This is formulated as a two-stage stochastic programming problem with the objective of minimising the total system cost. Several electrical devices and constraints for improving distribution system are considered and introduced [i.e. limitation of reverse power flow, generation of DG considering lagging-leading power factor, and capacitor bank (CB)]. Furthermore, the carbon emission costs and incentives are considered from the point of views of international trends and economics, because the challenge of global warming is discussed actively at the conference of the parties to the UNFCCC to achieve a clean environment and the governments generally, in order to reach high renewable penetration levels, subsidise the DISCOs that invest RES to their distribution system. • To evaluate and analyse the optimal solutions in various aspects, we compare the proposed scenario generation approach with existing ones. Besides, we investigate the impact of the different number of scenarios on optimal solutions. Also, we developed a framework for optimal allocation of DGs as an open source, where users can simulate under various conditions.

Paper organisation
The structure of this paper is organised as follows. The details of the proposed scenario generation procedure are described in Section 2. The stochastic programming model is provided in Section 3. The results of the numerical simulations are presented and discussed in Section 4. Finally, Section 5 provides the summaries and insights of this paper.

Scenario generation
This section describes the proposed scenario generation method that applies K-means to historical data in stages. The goal is to obtain the scenario levels of demand, electricity price, wind speed, and solar radiation for creating specific scenarios. The role of Kmeans is to classify an original dataset into a certain number of clusters K. The centroid of each cluster is the mean value of the data allocated to each cluster. The detail of the algorithm is given in [30].
Historical data need to be available for scenario creation, i.e. hourly demand, wind speed, solar radiation, temperature, and electricity price for the 8760 h of the year. Fig. 1 shows the overview of the proposed scenario generation. The steps are described below: Step (2): Apply K-means (the number of clusters K = K step2 ) to only the demand in each seasonal groups created in Step 1 and allocate each data into K step2 groups. Fig. 2 shows the example of the divided clusters of the demand. Moreover, wind speed, solar radiation, temperature, and price indexed to each demand data are also allocated to the same clusters of the demand. Each divided group is defined as demand block b, which is related to the representatives of demand clusters (e.g. peak load of summer, middle-load of summer, and low-load of winter). Total of the number of hours in demand block b is represented as Step (3): Apply K-means K = K step3 again into the demand, wind speed, and solar radiation of the data group created in step 2 and 3 × K step3 data groups are created per one demand block.
Steps 3-5 in Fig. 1 are focused on the flow of one of the data blocks in step 2. • Step (4): The mean values of each data block in step 3 are used as a block representative to create the factors of demand, wind speed, and solar radiation. Note that the price levels are determined by the mean values of the price within each demand block. Renewable production models in [9,31] are used in this paper to transform the renewable observation data into its equivalent output power (i.e. wind generation factor and PV generation factor). • Step (5)

Optimisation problem for long-term DG allocation
Two-stage stochastic linear programming is used to formulate the long-term allocation problem of DGs. The problem uses some scenarios and provides the optimal siting, sizing, and timing of RES-based DGs to be installed (wind power and PV).

Objective
This model minimises the total system costs consisting of the investment cost π t inv and operation & maintenance (O&M) cost π t om in consideration of the incentive μ t inc . The expected value of the O&M cost in year t is shown as where B t is the set of demand blocks in year t, N t, b hours is the total hours of demand block b in t, S t, b is the set of the scenarios in t and b, Pr t, b, s is the probability of the scenario s in t and b, and π t, b, s om is the O&M cost per unit time in t, b, and s. In this paper, it is assumed that the demand blocks and scenarios are the same every year Since, in the same region, the trend of the demand profile and the average of the weather data are considered not to change significantly. It is important to note that the operational environment of the power system is different in each year, since the time-dependent parameters exist such as demand growth factor, discount rate, and price increasing factor though the scenarios do not change.
Therefore, the aim of the model is minimising the total system costs over the planning horizon T where is the present worth factor.

Investment costs:
The following equations show the investment costs of the substation, wind turbine, PV, and CB. The costs are, respectively, annualised by using annuity factor Therefore, the previous year's investment cost is added to the next one, except for the first year π t inv = ∑ n ∈ SS π anu SS X t SS, n + ∑ n ∈ L (π anu PV X t PV, n + π anu WD X t WD, n + π anu CB X t CB, n ) + π t − 1 inv , t > 1.
3.1.2 Operation and maintenance costs: O&M costs are shown in the following equations: π t, b, s om = π t, b, s loss + π t, b, s ENS + π t, b, s SS + π t, b, s new + π t, b, s CB + π t, b, s emi , Total O&M cost includes the power loss cost (7), unserved energy cost (8), purchased energy cost (9), O&M cost of DGs and CB (10), (11), and carbon dioxide (CO 2 ) emission cost (12): • Loss cost π t, b, s loss = π loss ∑ n, m ∈ N S base r n, m I t, b, s sqr, n, m , • Unserved energy cost • Purchased energy cost • O&M cost of DGs • O&M cost of CB • CO 2 emission cost • CO 2 emission cost of DGs 3.1.3 Incentive: Incentive will be paid for the new investment of DGs by using the subsidy rate

Power balance constraints:
The following constraints describe the active and reactive power balance of the load and substation buses. It should be mentioned that the scenario of demand, η b, s load , is used by multiplying the peak load of each bus (see (16)) (see (17)) (see (18)) (see (19) To transform the non-linear (21) into the linear equation, the piecewise linear approximation described in [32] is used in this paper. The equation is linearised as follows:

Current, voltage, and power limits:
The current on branches, voltage of buses, and power flow on branches should be limited in the allowable range

Maximum DG size limits:
The following constraint defines the maximum DG installation capacity of each bus:

DG and CB generation limits:
Constraints (40)-(42) express the minimum and maximum generation of DGs and CB. Note that the scenarios of the wind power and PV, i.e. production factors η b, s WD and η b, s PV , are used by multiplying the maximum available output of each installed DG. The following constraints show the maximum available output in each year: P t avl, WD, n = P WD X t WD, n C WD, n , n ∈ L, t = 1, P t avl, PV, n = P PV X t PV, n C PV, n + P t − 1 avl, PV, n , n ∈ L, t > 1, (46) The number of installations of DG and CB in each bus is limited as The constraints of the reactive power produced by DGs are expressed by using leading/lagging power factor −tan(cos −1 (λ lead WD ))P t, b, s WD, n ≤ Q t, b, s WD, n ≤ tan(cos −1 (λ lag WD ))P t, b, s WD, n , −tan(cos −1 (λ lead PV ))P t, b, s PV, n ≤ Q t, b, s PV, n ≤ tan(cos −1 (λ lag PV ))P t, b, s

Investment limits:
The following constraints refer to the annualised and actual investment cost limits considering the lifetime:

Energy not supplied limits:
The unserved power must be less than the demand

Substation limits:
The following constraints show the generation limit of the substation: 0 ≤ Q t, b, s SS, n ≤ tan(cos −1 (λ SS ))P t, b, s S t avl, SS, n = S SS, n + S t new, n , n ∈ SS, t ∈ T, (59) The substation expansion is allowed up to the maximum power

Numerical simulation
We developed and released a general-purpose framework for optimal allocation of DGs as an open source in https://github.com/ ikki407/DGOPT, where users can simulate under various conditions. The simulation code used in this paper is implemented there.

Distribution system
The 34-bus three-phase radial feeder, shown in Fig. 3, is used to test the proposed scenario generation and allocation problem. The system has 1 substation and 33 buses with/without load. Details of the network are given in [33].

Data and parameters
The simulation parameters are shown in Table 2. Actual load data of Tokyo Electric Power Company are used as demand. The wind speed and solar radiation are the meteorological observation data of Miyakojima Island in Japan from 1 January 2015 to 31 December 2015. A 20 year period is used as a planning horizon. The electricity price is used instead of the loss cost (i.e. π loss = π b, s SS η t

SS
). The problem is solved using Gurobi 6.5.0 [34] on a Linux-based computer with 36-core Intel Xeon at 2.3 GHz and 256 GB of random access memory.

Simulation 1: proposed method
216 scenarios of demand, wind, PV, and price are created, setting the number of clusters as K step2 = 4, K step3 = 3 (Table 3). This simulation considers the following three cases: Case A: The investment is only allowed for the expansion of the substation, i.e. the right-hand side of (39) is zero. Case B: All the constraints are considered. Case C: Case B without investment constraints (54) and (55).
The information about the model of case B is described in Table 4. Fig. 4 shows the cumulative investment in each year, which represent the optimal siting, sizing, and timing. The O&M costs and total system costs are shown in Table 5.

Discussion:
The installation of DGs plays an important role to reduce the total system cost despite the fact that the investment costs are increasing. A significant contribution is that it drastically reduces the O&M costs (see Table 5). This is one of the general benefits of DG instalment. From Table 5, the greatest cost savings occur in the emission cost because the emission rate of the purchased energy at the substation is two times higher than that of the DGs. Moreover, the losses cost and purchased energy cost are reduced since most DGs are allocated around the terminal buses of radial distribution system. As shown in Fig. 4, the DGs allow the substation expansion to defer. However, the results imply that the expansion is not inevitable due to the intermittent nature of renewable DGs and the demand growth.
The O&M cost of CB decreases even if the number of CB increases, which implies CB co-exists well with the large amount of the installed DGs. Without the budget constraints, nearly the same amount of wind turbine and PV are installed. However, in the consideration of the budgets, the wind power to be installed is larger than PV because it is affected by the high subsidy rate of wind.
It is worth pointing out that the DGs have an important role in terms of system stability as well as cost minimisation. The average of the voltage deviations of all scenarios in the first-and finalplanning years are illustrated per case in Fig. 5. This figure shows the overall voltage drops as the demand increases for 20 years. Besides, the large installation of DGs makes the amplitude of the voltage more stable than no DGs.

Simulation 2: comparison of scenario generation methods
This simulation performed the comparison of the results by the following three scenario generation methods: Case A: Proposed method. Case B: Conventional sampling-based method [21]. Case C: Duration curve-based method [17].   216 scenarios were generated in each method. In case B, kernel density estimation (KDE) was used to create the complicated probability density function (Fig. 6). Fig. 7 shows the cumulative investments in each year, which represent the optimal siting, sizing,  and timing. The O&M costs and total system costs are shown in Table 6. : From Fig. 7, the substation is expanded twice in cases A and B, while three times in case C. In the comparative method, the first investments are done 2-3 years earlier and the investments of renewable energies have begun in 7-8 years. Also the PV has not been installed in case C. From these results, it is found that the optimal allocation differs depending on the scenario generation methods. Therefore, in order to analyse the characteristics of the scenarios, the expectation values of the random variables in each scenario generation method were calculated as Table 7. This table shows that the demand factors are almost equal in each generation method, and the proposed method creates scenarios in which the generation amount of wind power and PV is estimated to be larger than the comparison methods. In other words, the comparative methods estimate the expected generation of wind power and PV to be small. The budget for renewable energies had decreased, because it was necessary to expand the substation at an early stage in order to meet the stable supply. Thus, it is considered that the renewable energies were not invested at the initial stage in the comparative methods.  Particularly in case C, since the expectations of wind power and PV are smaller than other methods, it can be explained that the substation expansion was three times. Consequently, it is suggested that the optimisation problem is very sensitive to scenariodependent parameters, because the optimisation results differ widely even though the expected values shown in Table 7 are not greatly different. Sensitivity analysis of the parameters dependent on the scenarios will be needed, but it is difficult because the size of the optimisation problem is very large, in order to further pursue the cause of the impact on the optimal results. Also, in the case B, the relationship between the generation factors is 'wind >PV'. Besides, the subsidy rate is 'wind power (10%) > PV (5%)'. Therefore, it is considered the investment of PV was 0 in case C, though the investment cost of wind turbine was higher than PV.

Discussion
From Table 6, the total system costs of the comparison methods are larger than the proposed one, especially the purchased energy and emission costs. This is because the substation was expanded early. However, it is found that the system operation was more stable as the not supplied energy cost was suppressed. This is because the timing of investing renewable energy was late and its capacity was not large.

Simulation 3: impact of the different number of scenarios
To investigate how the number of scenarios affects the optimal solutions, several scenarios were created by using the proposed generation method (Table 8) and the optimisation problems were solved. Fig. 8 shows the cumulative investments in each year, which represent the optimal siting, sizing, and timing. The O&M costs and total system costs are shown in Table 9.
4.5.1 Discussion: First, the optimal solutions of case F were not found within a week due to the large problem. Comparing cases B and C, though the number of scenarios is the same (the amount of calculation is almost the same), it is considered that the scenario of case B is focused on the magnitude of demand, and case C is the seasons. The scenario of case B can consider stricter demand periods than that of case C, because the not supplied cost of case B is ∼35 times that of case C ( Table 9). The optimisation problem  depends on the decision maker's environment. Therefore, in the consideration of the intermittent nature of renewable energy, decision makers should select the scenarios of case B in a situation that the supply and demand are tight, or case C if the demand is not huge and there is a trend in the weather. Cases D and E have the same number of scenarios, but case E is considered not to emphasise the demand magnitude as the number of divisions of the demand is small. Power system planning should not cause power outage and it is desirable to cover the appropriate demand scenarios that would occur in reality. Hence, case E seems to be optimising under loose conditions. As a result, the total system cost of case E is ∼5% smaller than case D. Accordingly, it is important to carefully divide the demand in power system planning in order to represent appropriate and close-to-reality situations.
In Table 9, the total system costs decrease as the number of scenarios increases because there is no need to aggregate information of scenarios, that is, the expressive power of probability distribution increases. That is why more realistic situations can be considered. The realistic situation is the one, for example, where a scenario (probability 1) in a certain demand block is divided into two scenarios of high demand (probability 0.05) and low demand (probability 0.95). However, though it is described that increasing the number of scenarios will reduce the total costs, this can only be explained in the context of this numerical simulation.
From the three numerical simulations so far, it turns out that the optimal allocation is not necessarily stable because it depends on the number, type (diversity), and creation method of scenarios. As means for solving the above problem, it is conceivable to increase the number of scenarios, improve the scenario generation method, and develop an algorithm that selects or integrates from the optimal solutions obtained from various scenarios. However, though the optimal allocations were different, the range of the optimal 'objective function values' exists within a maximum of 6.3% in simulation 2 and 9.4% in simulation 3. Therefore, the proposed optimal allocation problem is stable, independently of the scenario generation methods, with regard to the cost minimisation.

Conclusion
This paper has presented a procedure for creating the scenarios of demand and DG generations with K-means. Simultaneously, a long-term allocation problem of RES-based DGs has been formulated as a two-stage stochastic programming problem and tested on the 34-bus distribution system. The obtained results and insights are summarised as below: • By solving the proposed optimal allocation problem, the longterm optimal siting, sizing, and timing of RES-based DGs are determined. In addition, the investment of RES-based DGs has greatly contributed to the reduction of the total system costs and system stabilisation. • Stochastic DG allocation problem should be evaluated under various conditions to measure its characteristics accurately, because the optimal solutions change sensitively depending on the situations and parameters considered by the decision makers, performing several simulations on the different numbers, types, and generation methods of scenarios. • The proposed scenario generation method is effective to interpret the optimal solutions, because it can select whether to emphasise either time period (season) or demand information. • By analysing the various optimal costs, it is shown that the proposed optimal allocation problem is stable from the costminimisation's point regardless of the scenarios to be used. • For the purpose of ensuring transparency and contributing to field development, we developed and released a general-purpose framework for optimal allocation of DGs as an open source.
Future research include the following: • Sensitivity analysis of the parameters dependent on the scenarios to investigate the impact on the optimal results. • Numerical simulation in a large system and development of solving algorithms that make it possible. • Extension to second-order cone programming, chance constrained programming, and multi-stage stochastic programming problem.

Acknowledgments
This work was supported by JST CREST grant number JPMJCR15K5, Japan. We also thank members of our laboratory for their help with this work.