Data granularity and the optimal planning of distributed generation

Research regarding the optimal planning of distributed generation is often based on coarse energy use and generation data, which does not accurately re ﬂ ect real variations in energy pro ﬁ les. This paper in-vestigates the impact of this lack of temporal variation on the optimal planning of distributed generation. The problem of loss minimization in the residential setting is used as a guideline. The outcomes of a stochastic optimization model for energy pro ﬁ les de ﬁ ned on different time aggregation levels are compared. At ﬁ rst glance, modeling on a ﬁ ner time scale seems to affect optimal planning solutions, with a shift from variable stochastic sources to sources that provide constant generation. However, it turns out that the gains of using these new optimal solutions in terms of reducing energy losses are limited. The results suggest that for optimization purposes it is not necessary to use data at a resolution smaller than hourly time steps. If energy pro ﬁ les are de ﬁ ned on time steps smaller than one hour it is important that the full range of the stochastic ﬂ uctuations is taken into account, rather than evaluating a couple of scenarios.


Introduction
In the recent years there has been an increased focus on 'smart' electricity systems that continuously monitor, match, and control energy use and generation in order to facilitate the implementation of onsite-generation [1e4]. In the context of these developments opportunities arise for intensive data collection. Whereas data used to be collected on the level of daily or hourly means, now data is collected and even used in real time [5,6]. Since household load and renewable generation are known to fluctuate on the level of minutes or even seconds [7], fine grained data collection could be relevant to provide an accurate picture of energy use and generation. Indeed these new possibilities are embraced by researchers who are formulating increasingly precise energy profiles [8e11]. However, when it comes to simulating or optimizing the grid, the use of accurate data comes at the price of computational efficiency. Adding more accuracy in the time dimension comes with the need to acknowledge the unpredictable nature of the short term fluctuations, which requires highly complex models. Moreover, intensive data collection and storage is not without costs. The question arises whether the possible improvements to distributed generation (DG) planning owing to more accurate data outweigh the costs of acquiring, storing, and using the data.
In this paper the impact of using electricity use and generation data of higher temporal resolution on optimal DG capacity planning is analyzed. To guide the analysis, a general model for determining the capacities of DG's at which energy losses are minimized is employed; an elaboration on [12]. The focus of this paper is not on the calculated optimal capacities itself, but on the influence of data choice on the calculated optimal capacities. First, the implications of high temporal data for modeling flexibility are outlined. Then the impact of increased data granularity on estimates for the performance measure and optimal capacity levels is evaluated by comparing the outcomes of the optimization model for energy profiles with different levels of time aggregation. The latter has drawn especially little scientific attention so far.

Granularity and system performance
Quite a few papers have evaluated the bias from the use of coarse data when calculating performance measures for the power distribution system. The resulting picture is ambiguous. As Bucher et al. [13] explain, much depends on which part of the system is being analyzed. For a realistic representation of maximum power, maximum voltage or energy flows, a one minute time step seems appropriate; whereas for evaluating transient currents and voltages, smaller time steps are necessary. For broad overviews of energy flows more coarse data is sufficient. Moreover, whereas one hour time steps may be too coarse for analyzing data on the individual level (i.e. domestic loads), when looking at multiple households, much of the short term fluctuations are balanced out due to the aggregation of different profiles [14].
At least two papers focus specifically on the impact of granularity on measuring the mismatch of demand and generation (which is the input to loss minimization models). Wright et al. [15] use energy use information collected at a one minute time interval from seven different households, to calculate the degree of measurement error in export and import proportions when this data is averaged over five, fifteen, or thirty minute time intervals. They conclude that five minute time steps provide a good balance between accuracy and the burden of data size. Cao et al. [16] do not only consider the short run variability of demand, but also of onsite photovoltaic (PV) generation. They simulate energy use and generation profiles on a one minute interval and from there construct profiles on five, ten, fifteen, thirty, and sixty minute time steps. Using this input they compare on-site energy fractions and on-site energy matching for different setups of the site. They show that using one hour averages leads to large biases compared to the one minute data. However, they cannot give specific guidelines with respect to the granularity. The size of the bias depends on the setting evaluated, so that in some cases five minute data may suffice, while in others much can be gained from using finer resolution data.

Granularity and decision making
Even if one can assume that increased data accuracy leads to more reliable performance measurement of the power distribution system, this does not necessarily imply it improves capacity planning. The consequences of adding precision to the input of optimization models for DG capacity planning is rarely discussed in scientific literature. Ochoa and Harrison [17] provide a first step in the discussion by advocating the use of multi-period ('energy') models for loss minimization instead of the popular one period ('power') models. Power models evaluate the system at one moment in time, using a snapshot of the systems performance for optimization. Energy models, on the other hand, allow to evaluate the system over a span of time, thereby accounting for time variations and time dependencies in demand and supply. Ochoa and Harrison show that using one period models leads to a downward bias in the performance measure and a resulting overestimation of optimal DG capacity levels.
The next step is to determine the appropriate length for these time periods. There are two recent papers in which the impact of using time periods shorter than one hour in optimal DG capacity planning is discussed, both in the context of cost minimization. Hoevenaars and Crawford [18] investigate the optimal capacities of the elements of a standalone hybrid power system for different temporal resolutions of the data, ranging from one second time steps to one hour time steps. They explain that the appropriate level of analysis depends on the components included in the model. Whereas the optimal amount of diesel fueled generation is highly dependent on the time step chosen, systems including solar modules, wind turbines, or batteries are less dependent on the granularity of the data used. Hawkes and Leach [19] look at optimal sizing of a combined heat and power (CHP) unit for a single household and finds that the calculated optimal capacity based on the hourly data is twice the optimal size found when using data on a five minute level. The resulting difference in costs can be up to 16%. They believe it is not necessary to use data on an even smaller granularity, as the improvements from the level of ten minutes to five minutes is already rather limited.

A stochastic approach
When one wants to use high resolution data for decision making, the accompanying optimization model naturally becomes more complex. Not only does increasing the temporal granularity of energy profiles result in adding data points, also one now needs to acknowledge the unpredictable nature of short term fluctuations in energy use and generation, such that stochastic formulations are required. By adding granularity to the input of an optimization model one thus needs to switch from evaluating the performance of a system under a typical situation to evaluating the expected performance of a system given the complete range of possible situations. In this way the amount of information evaluated in the model increases exponentially and one must apply approximations to keep the model tractable. Not surprisingly, most models for optimal DG planning are defined in a coarse deterministic framework, as one can conclude from the extensive overview of models in Kumawat et al. [20]. They review and classify more than 70 high quality research papers on optimal DG planning that have been published after 2010. They distinguish the following four approaches to modeling the load as input for the model. Within the brackets the percentages of papers reviewed that fall within each classification are given. 1. one-load level with single case (16%) 2. multi-load levels (51%) 3. time-varying (practical system loads) (28%) 4. probabilistic generation considering uncertainties in load (5%) Most of the examples mentioned, such as [21e24] use hourly, daily or even monthly time steps. Only three of the papers [25e27] consider probabilistic generation. All of these consider load per hour, just as the more recently published paper within this category by Kayal et al. [28].
In those applications dealing with the reliability of the power distribution system, elegant optimization models have been defined where higher time granularity does not add complexity, borrowing from techniques used in telecom research, see Refs.
[29e31]. The focus then lies on the performance of the system under extreme conditions. Unfortunately, such an approach does not readily apply to the type of optimization problems where average performance of the system (as is the case for loss minimization) is the main concern.
The aforementioned research on granularity and decision making acknowledges the stochastic nature of the short term fluctuations by considering several likely data profiles (samples or snapshots) rather than one average profile. In this way the computational burden of the analysis can be kept to a reasonable level, without needing to compromise on the complexity of the model. The stochastic nature of short term fluctuations could impact optimal decisions, which may not be adequately captured using a small amount of samples. This paper aims to take the full span of possible realizations into account by framing the problem in the language of stochastic optimization. This allows for consideration of the complete range of stochastic fluctuations in the model at the cost of the level of detail that can be included in the model.

Outline of the paper
In Section 2 the problem used for the calculation is presented and the consequence of adding precision is explained with respect to complexity and computational effort. Then in Section 3 the parameters and data underlying the numerical calculations are explained. In Section 4 the results of the different calculations are presented and compared, after which the conclusions and recommendations are presented in Section 5.

Loss minimization model
A general DG planning problem will serve as a guidance to the analysis. A residential district attached to an LV/MV transformer is considered. The aim of the model is to find the optimal mix of DG's for this district in order to minimize (expected) energy losses, in line with [12]. The model provides the optimal capacities for a combination of generators. The physical or logical location of these generators in the network is not determined by the model. Each household can install one or more types of DG. The capacity of each DG unit is fixed. 1 Total energy losses are the result of the balancing process of energy use (demand) and generation (supply) over a span of time, which is an optimization problem in itself. Supply should equal demand at all times. To accomplish this, electricity can be exported to or imported from other neighborhoods via the grid. Also, the neighborhood has access to a storage unit that can be used to shift supply. In both these channels losses are incurred. Given the observed demand and supply the optimal balancing strategy should be chosen so to minimize energy losses.
In the stochastic optimization literature a problem as defined above is referred to as a two-stage recourse problem. These types of problems consist of two sets of decisions; those made ex ante the realizations of the stochastic variables (stage 1) and those made ex post those realizations (stage 2). In our case the first stage entails the choice of capacities of the DG's (before we know the exact levels of demand and supply), and the second stage entails the balancing process (as soon as demand and supply are observed).
A schematic representation of the problem can be found in Fig. 1. The mathematical formulation can be found in Appendix A. In Table 1 the notation that is used throughout this paper, is introduced. In defining the problem, the aim has been to be as general as possible. The goal of this exercise is to make recommendations regarding the appropriate input (data profiles) for the model, not to make recommendations with respect on the output (optimal capacity levels) of the model.

Solution strategy
The capacities of the DG's are chosen such that expected losses are minimized. Calculating such an expected value exactly would imply the evaluation of iterated integrals, which is a computationally heavy task. Therefor, an algorithm is imposed that uses approximations to the objective obtained by sampling from the joint distribution of the random variables, namely Stochastic Decomposition (developed by Higle and Sen [32]). The algorithm is based on the concept of optimality cuts and works in an iterative way. In each iteration one sample is taken from the joint distribution of random variables describing the energy use and generation at each time point. Then, using all the samples collected up to that point, a plane is constructed orthogonal to the objective function, the optimality cut. Together these cuts form a lower bound to the objective function, which can be used as an approximation of the actual problem.
In every iteration, more information is thus added to the pool of data so that reality can be approximated more closely the longer the algorithm runs. This also implies that the cuts constructed in the beginning of the algorithm are based on very limited information, such that these cuts can turn out to form too tight approximations of the objective. Therefore, in every iteration the cuts constructed in previous iterations are multiplied by a factor smaller than one, so that the possibly invalid cuts slowly drop out. In this way the approximating problem forms a statistically valid lower bound to the average function throughout the algorithm. Large problems in numerical analysis are often ill conditioned [33], as also turns out to be the case for the problem at hand. The objective is quite flat around the optimum, such that the algorithm convergence to the optimal solution is hampered. The updating process of the optimality cuts leads to loose bounds in the flat area, such that solutions keep jumping back and forward within this region. Therefore, the algorithm is adapted slightly, by holding off this updating process until four iterations after the cuts were constructed. In this way the bounds are tighter, such that it in fact now only forms a statistically valid lower bound to the objective function in the limit. In practice, this helps the algorithm to converge more easily. For more information on this adaption the reader is referred to [34]. Quick convergence is especially relevant when complex data profiles are defined, as extra sampling in such cases is costly.  Table 1 Overview of notation.
Supply of a DG of type i in period t s t Total supply in period t b t Storage level at the end of period t w t Exported electricity in period t (dv) g t Imported electricity in period t (dv) l t Charging amount in period t (dv) u t Discharging amount in period t (dv) Installed capacity for DG of type i (dv) Note: dv ¼ decision variable. 1 Other objectives than minimizing losses could have been considered. The choice of objective may have substantial impact on the calculated optimal capacity. However, recommendations with respect to the appropriate granularity of data as input for optimization models are generalizable to other objectives dealing with the average performance of the system, such as minimizing system costs.

Data
To evaluate the degree of data accuracy relevant for optimal DG planning, the model defined in Section 2 needs to be evaluated using data on different granularities. In this section the construction of the energy use and demand data and the assumptions on the parameters in the model are explained. The information underlying the data profiles stems from earlier work at TNO [12] and auxiliary sources [10,11].

Construction of energy profiles
An energy profile is defined as an ordered collection of data on electricity use and generation over a specified span of time (as opposed to a static 'power' measurement). One can distinguish two approaches to construct such an energy profile: a bottom-up and a top-down approach. Using the bottom-up approach one starts with the characteristics underlying demand or supply and builds a model based on these characteristics. An example is the model for domestic energy use and PV generation presented in Refs. [10,11]. Domestic energy use is retrieved by modeling the presence of people in the house and consequently the electric appliances they will use. Adding the energy use of all the different appliances gives a domestic energy use profile. Such approaches allow for demand and supply to be modeled very accurately, however these models are computationally expensive.
Taking a top-down approach, electricity use and generation are modeled directly, without referring to underlying elements, for example using a stochastic process. In the case of energy use, this implies that the total energy use is modeled, rather than retrieving the total energy use from several models dealing with the use of the separate appliances. These top-down models are often based on definitions of data demand in communication networks, such as Markov chains [35] or stochastic arrival curves [36]. Alternatively, rather than modeling a process, one can use coarse energy profiles and define a single known distribution for the variations within those wide time frames [37,38]. However, these models cannot capture all characteristics of demand and supply. In most definitions it is for example hard to include the non-stationarity of the processes, such that one needs to define separate processes for different parts of the day, as in Ref. [8].
Two types of distributed generators are considered: (1) a photovoltaic array (PV) sized at 8m 2 and (2) a micro combined heat and power system (micro-CHP) with a capacity of 1 kW. For these generators, supply profiles are defined on a granularity of one hour, fifteen minutes, and one minute. Similarly, neighborhood demand profiles are defined on these three levels of granularity. Neighborhood demand and PV generation are assumed to be stochastic. The output from the micro-CHP systems is assumed to be deterministic e the system is either on or off. When it is on, it produces a constant level of energy.
In the context of stochastic optimization it is natural to make use of top-down energy profiles, since these can be defined as a collection of (joint) random variables. However, given the higher descriptive value of bottom-up energy profiles, calculations using a set of bottom-up energy profiles are provided to test the validity of the conclusions derived from the top-down profiles. All profiles contain information about one day in each season. The losses occurring at each of these days are combined using a weighted sum, so to obtain a measure of yearly losses.

Top-down profiles
To create the top-down profiles an hourly base profile of energy use and generation is supplemented with intra-hour distributions. The data underlying these profiles all stem from a research project on demand steering at TNO [39]. In Fig. 2, the hourly base profiles for demand and generation are shown for each season.
The micro-CHP system considered is one that has been on the market since 2010. It is heat demand following and is equipped with a buffer to store overproduced heat. When the micro-CHP system starts producing heat it takes 15 min to start generating electricity and produces 1 kW per hour. The

Bottom-up profiles
The bottom-up profiles are defined using a simulator created by Richardson et al. [10,11]. With this simulator one hundred traces are retrieved of neighborhood demand and PV supply on a one minute granularity for the first day of each season. The parameters were set such that the setting resembles that of the data for the top-down profiles retrieved at TNO. By aggregating these one minute demand and supply figures, comparable traces on a fifteen and sixty minute granularity are generated. For the micro-CHP output the hourly base profiles defined above are used.
Each simulation starts with running an irradiance model providing information on the radiation of the sun if the sky would be completely clear all day. After that, the clearness index of the sky is simulated in order to account for the instantaneous effects of clouds. From this the output profiles for a single PV panel are retrieved. Next, N h traces of household demands are simulated. When summed up, these traces give the neighborhood demand. The household size can be varied between one and five. The number of traces simulated for each household size were chosen in accordance with numbers from Statistics Netherlands (statline.cbs.nl) on household sizes in the Netherlands, see Table 2. Through the incorporation of lighting demand, the household demands are correlated with the output of the PV panels. In Fig. 3, one sample of neighborhood demand and PV generation is shown for the first day of each season, assuming every household installs a PV system.

Assumptions on parameters
To perform the calculations, assumptions need to be made with respect to the efficiency and capacity of the storage unit, and with respect to the efficiency of the grid. These assumptions are summarized in Table 3.
The neighborhood is assumed to have access to the futuristic storage device StorageX as presented in Refs. [12], which has a roundtrip efficiency of 95%. Assuming these losses are equally divided among loading and unloading we define a l ¼ a u ¼ 0.975.
This storage unit has a capacity of 2000 kWh, hence b max ¼ 2000.
Variable transport losses in the Netherlands are estimated to be 4.35% under the scenario that hardly any distributed generation is installed [39]. This figure can be translated to the quadratic loss parameters using the following formula From this, it follows that the values for b w and b g depend on the granularity of input, as

Baseline results
In Table 4, the results from the optimization are shown. The first line in each panel shows the optimal solutions given the baseline profiles (hourly time steps). The second and third lines show the results when data is defined on a finer granularity using intra-hour   . 3. Simulated profiles for a district with 250 households. Table 3 Parameter settings.
distributions. The optimal solutions were found using the adapted Stochastic Decomposition algorithm. Stochastic Decomposition is a stochastic algorithm and will thus not yield the exact same solution each time it is run. This is usually not a problem, however, the goal of this paper is to compare the solutions retrieved using different input profiles. In order not to confuse differences in solutions due to the granularity of the data with differences in solutions due to the nature of the algorithm, some extra steps are taken. Every calculation is repeated five times. For the retrieved optimal solutions the objective is re-estimated using 500 samples of demand and supply. Among these five solutions, the solution with the lowest reestimated losses is chosen. Furthermore, the neighborhood of this solution is searched to find the range of solutions for which the estimated losses are no more than 0.005% points higher than the estimated losses for the optimal solution. These solutions will be referred to as near-optimal solutions and are denoted between curly brackets in the tables. For every solution not only the model estimate of the objective is provided in column 4, but also the estimates based on the data profiles on the other time granularities, column 5 to 7. When the hourly base profiles are used for optimization, the optimal solution indicates to install all available capacity of PV panels and about 62.5% of the available capacity of micro-CHP. The estimated losses are 1.64% of total energy demand, which is less than half of the losses when no DG's are included (4.35%, see Section 3.3). When considering the intra-hour fluctuations, the optimal installed capacity of micro-CHP increases. For the profiles constructed with one minute time steps the resulting optimal capacity levels are about eight percentage points higher than those resulting from the use of the hourly base profiles. The estimated losses of the model increase as the granularity gets finer.
One may notice that the optimal micro-CHP capacities resulting from the fifteen minute profiles fall within the range of nearoptimal solutions retrieved using the base profiles. Not surprisingly, re-estimating the losses for those two solutions using the fifteen minute data results in the same amount of losses. Under the assumption of Normal distributed intra-hour fluctuations, the same holds when moving to the minute time steps; the increase in micro-CHP capacity does not lead to substantially lower losses. Under the assumption of skewed (Gamma) intra-hour fluctuations, there does seem to be a slight improvement from the new solutions. When re-estimated using the one minute data the losses are 0.02% points lower when using the one minute optimal solution rather than the optimal solutions from the baseline.

Sensitivity to storage assumptions
The assumptions on the battery efficiency levels may impact the conclusions with respect to the granularity of data. As Notton et al. [40] explain, when assuming access to storage, modeling on a finer timescale only matters if indeed within that timeframe the excess supply is sometimes positive and sometimes negative. So if within the hour demand always exceeds supply, the losses are not expected to be impacted by the granularity of the data.
The calculations are repeated under the assumption that no storage is available. In fact, it can be shown that when the storage unit has a roundtrip efficiency of about 80% rather than 95%, storage is hardly used given the optimal control policy imposed. The results of the calculations without storage are presented in Table 5.
The optimal capacity levels are indeed somewhat lower than in the case with storage. When the hourly base profiles are taken, the optimal level of PV capacity is 2.5% points lower and the optimal level of micro-CHP capacity is about 10% points lower. The estimated losses are quite a bit higher than in the situation with storage, but still far lower than in the case of no DG (2.09% rather than 4.35% of total demand).
In contrast to the case including storage, when moving from the 15 min to the 1 min level the optimal PV capacity drops by about 20% points. This decrease in optimal installed PV capacity is slightly compensated by an increase in the optimal micro-CHP capacity. So there is a shift from the unpredictable supply source, that may lead to large peaks in export or import, to the more stable source of generation. The improvement in losses (estimated at the minute level) from using the fine grained solution rather than the coarse solution is now about 0.02e0.03% point. This is a slightly bigger improvement than in the case without storage, however it is still not substantial.

Stochastic bottom-up profiles
In Table 6, the results of the calculations using the simulated generation and demand profiles is shown. The retrieved optimal PV capacity levels are lower than in the calculations in Section 4.1, whereas the optimal micro-CHP capacity levels are higher. Also the range of near-optimal solutions is smaller and the estimated losses are lower.
The effect of the granularity of data on the optimal solutions is similar to what was found using the top-down profile. The optimal installed PV capacity levels decrease as the granularity of the data gets finer, where this decrease is compensated by an increase in Table 4 The optimal capacity levels and estimated losses. optimal installed micro-CHP capacity levels. However, in this case the differences are so small, that the losses, re-estimated at the one minute level, are the same for the solutions retrieved at the three different time granularities. The ranges of near-optimal solutions increase when using higher resolution data, indicating that the objective function gets flatter around the optimum. When assuming there is no access to efficient storage, the improvements due to the finer granularity of data are more visible, see Table 7. However, still the improvement in estimated losses gained from using finer grained data are only 0.02% points, similar to the results retrieved using the top-down data. Even though the pattern in the change of optimal solutions is similar using the high resolution data, it is noticeable that the bias in estimated losses is much smaller. Apart from the data being constructed in a different way, the main difference between the top-down data and the simulated data is the lack of imposed autocorrelation and cross correlations in the top down data profiles. Although demand and supply are very unpredictable and may fluctuate from minute to minute, current levels are likely to depend on those in the periods before. So when supply exceeds demand in a certain period, one can expect it to be higher also in the next couple of periods, especially when small time steps are taken. When within the larger time frame the system is either continuously in overproduction or continuously in underproduction, modeling on a finer timescale does not give us extra information regarding losses.
In Fig. 4, one sample of excess supply in the district is shown when the PV system is installed at 125 of the houses in the district and the micro-CHP system is installed at 150 of the houses (respectively 50% and 60%, the optimal solution for the case without storage based on 15 min data). The gray line shows the original data and the black line shows the data when aggregated to 15 min time steps. Indeed compared to the variation over the day, the fluctuations around the 15 min line are not that large.
In other research based on data from the same simulator [16], it is shown that aggregating data to hourly profiles leads to highly biased estimates of mismatch between demand and supply, contrary to the results here. This can partly be explained by the fact Table 5 The optimal capacity levels and estimated losses, no storage.  Table 6 The optimal capacity levels and estimated losses given simulated data.  Table 7 The optimal capacity levels and estimated losses given simulated data, no storage. that the focus in that paper is on single houses rather than districts, so that demand shows more pronounced peaks on the minute level. However, more importantly, those results are based on four 24 h electrical demand profiles and two 24 h PV generation profiles. In Fig. 5 three samples of PV output are shown. These profiles show considerable variation. Looking at the spring profiles, one shows barely any generation (a cloudy day appears to be simulated), while the other two show high peaks, though at different points of the day. When looking at only a couple of snapshots of the system, outliers may influence the results and biases may be exaggerated. It seems that when looking at the average performance of the system given the whole range of possible situations, such problems are mitigated.

Conclusion
In this paper the impact of using higher resolution data on the optimal planning of distributed generation is evaluated. There is an increased effort towards constructing energy use and demand profiles with small time steps. However, research findings on the usefulness of those high resolution profiles have not been conclusive. This paper aims to provide guidance with respect to the appropriate granularity of data. Using a general stochastic loss minimization model, the consequences of using data with time steps smaller than one hour have been evaluated. The consequences to modeling flexibility have been discussed and optimal solutions have been compared for energy use and generation profiles on multiple levels of granularity.
In the model used for the numerical comparisons, a PV panel and micro-CHP system could be installed at households in a typical Dutch neighborhood. Since short run fluctuations in renewable generation and demand are highly uncertain, energy profiles should contain a stochastic dimension. Even when considering only a couple of possible scenario's this implies a huge computational effort. At first glance, the optimization results do seem to indicate that optimal capacity levels change when the length of time steps decreases. Optimal PV capacity levels decrease, whereas optimal micro-CHP capacity levels increase. However, when looking at the estimated losses given the different solutions, the changes are minor. In fact, when evaluated at a fine granularity the objective is quite flat, so that many solutions give approximately the same estimated losses. Estimating losses with coarse data however does lead to large overestimations. The results from the simulated profiles suggest that when sufficiently accounting for cross-and autocorrelations, the hourly inputs do not lead to largely different estimated losses than the minute data. This is contrary to results from earlier research, based on the same data, though not taking a stochastic approach. Still the objective is flatter when using fine grained input.
The results suggest that for optimization purposes it is not necessary to use fine-grained data. In fact, the high resolution data show that many solutions are similar in outcome, such that even  near-optimal solutions can give satisfactory outcomes. Considering the computational burden and limits to modeling flexibility that come with using high resolution data it is thus advised not to use data with time steps smaller than one hour for optimization. However, when evaluating the current state of a system rather than optimizing the system it may be relevant to increase granularity. When done so it is advised to acknowledge the full spectrum of the probabilistic nature of the variables, rather than just a couple of scenarios, such that the optimization process is less prone to be influenced by outliers in the samples. Also, when the objective is not to optimize some sort of average performance of the system (cost effectiveness, real losses etc.) but to increase performance under worst case scenarios (reliability), the short term fluctuations may be important for the process of optimization.
Our analysis was guided by a model on loss minimization in a residential setting. Therefor, highly intermittent types of DG, like wind were not included in the analysis. Considering types of DG that show bigger fluctuations than PV or CHP, will likely lead to higher bias in the performance measure. However, as noted above such bias in the performance measure does not seem to translate to suboptimal capacity levels. Inclusion of more intermittent types of DG are not likely to influence the conclusions with respect to the appropriate data granularity of optimization. On a similar note, the choice of loss function and storage policy are likely to affect optimal capacity levels, but not the influence of data granularity on those optimal capacity levels. For expositional purposes it has been chosen to work with a model that does not carry to many complexities in dimensions other than the granularity of data. Given our conclusions it does not seem profitable to develop a model that includes both the complexity due to increased data granularity as the complexities of possible other grid configurations. It is more profitable to focus on adding complexity that makes the model itself more realistic, rather than adding complexities that allow the model to use more realistic data.
It should be noted that the assumptions regarding losses were based on information about the Dutch grid, which is relatively efficient compared to that of other countries. Also a 0.01% improvement in losses comes down to about 500,000 euro worth of electricity in the Dutch grid. However, when considering a country with larger land area, like the United States, a small percentage savings in losses can actually lead to quite substantial financial gains. Moreover, a highly efficient storage unit was assumed, which does not exist yet. Given these considerations, it is likely that the potential effect of granularity found in this study is on the conservative side. However, given the very small gains from using one minute data rather than hourly data, it is not to be expected that the improvements will be substantial in less efficient settings. b t ; w t ; l t ; g t ; u t ! 0; t ¼ 1; …; T: (A.8) Note that by looking at the district as a whole, information on losses from transport within the district, that is, transport from one house to another, is excluded. These losses are negligible [41].

Appendix A.3. Simplifying the optimal balancing process
The balancing process to minimize energy losses is quite complex, in the sense that in every time period a decision needs to be taken. When increasing the time granularity of the input (the energy profiles) more and more decisions are added to the model. Moreover the stochastic nature of the fluctuations brings forward the need to evaluate many different scenarios. As a consequence the model quickly becomes difficult to solve using standard techniques. Consider, for example, that one wants to evaluate a problem with one type of generator for one day in each season. In the case of hourly deterministic time steps one needs to determine w t ,l t ,g t , and u t , 96 times. Now, suppose that on the one minute level the output of this generator can be approximated by a discrete distribution of S 1 values. Similarly, let demand be represented by a discrete distribution with S 2 possible realizations, such that there are S 1 S 2 combinations of the two. Then there will in total be (S 1 S 2 ) T possible scenario's to evaluate. When evaluating one day per season on the level of minutes, T equals 5760. It is clear that the amount of scenarios retrieved in this way, and with that the deterministic equivalent of the stochastic model, is quite large.
By means of approximation, a control policy is imposed to guide the second stage decisions. In order to define an optimal control policy in this setting, one needs to assume that it is not possible to configure the storage unit such that future events can be anticipated. Under this assumption it can be easily checked that in each period the excess (or shortage) of energy should first be compensated for by exporting (or importing) it to the grid. When the level of excess supply (or demand) is so high that the marginal losses from export (or import) exceed the marginal losses from storage, the storage unit is activated. When the storage unit reaches full capacity (or is depleted) any remaining excess supply (or demand) should be exported (or imported). In Table A.8 the mathematical formulation of this control policy can be found. By applying this control policy, the two-stage recourse problem is transformed into a problem of the form min x2X E u ½gðx; uÞ: (A.9) where g(x,u) is given by a direct expression, rather than an optimization problem. Such that now only the value of x needs to be determined by the model, rather than the values of x,l,u,g and w. The optimal second stage control policy, where D t : ¼ s t Àd t w t g t l t u t b t