Machine Learning Based Method for Estimating Energy Losses in Large-Scale Unbalanced Distribution Systems with Photovoltaics

Powered by TCPDF (www.tcpdf.org) This material is protected by copyright and other intellectual property rights, and duplication or sale of all or part of any of the repository collections is not permitted, except that material may be duplicated by you for your research use or educational purposes in electronic or print form. You must obtain permission for any other use. Electronic or print copies may not be offered, whether for sale or otherwise to anyone who is not an authorised user. Mahmoud, Karar; Abdelnasser, Mohamed; Kashef, Heba; Puig, Domenec; Lehtonen, Matti


I. Introduction
T HE high increase in the demand for electricity has no longer been satisfied by the non-renewable energy sources [1]. The hybrid renewable energy system (e.g. solar and wind turbine) is one of the most employed renewable energy sources to fulfil the high energy demand, in addition to their friendly nature to the environment [2], [3]. The consumption of renewable energy sources has a positive impact on economic growth [4]. Estimation of losses in the distribution systems is affected by the fluctuated output power of renewable energy sources. The impact of photovoltaics (PV) fluctuation cannot be ignored due to its high capacity at the large power grids.
The impact of PV on the electrical distribution network can be analyzed by comparing the output of the system before and after connecting the PV source through different PV scenarios. The effect of PV on the losses and voltages is studied by using DIGSILENT power factory software [5]. The DIGSILENT power factory depends on making a balance between the load and the production of the PV system. When the generation of the PV is more than the load power, the excess power is migrated to the grid. On the other hand, when the load power is more than the PV generation, the excess power is fed by the grid.
In the literature, a day-ahead method for loss estimation depending on insufficient historical data mining is proposed in [6]. This method is based on creating a similar day matrix that is obtained using the statistical analysis of different weather conditions. The impact of the PV system in a low voltage network has been tested through three scenarios [7]. Based on this study, the penetration of the PV system should not exceed 50% with respect to the total load; otherwise, it will contribute to unbalancing voltage and high network losses. For achieving the lowest penetration of the PV, it should be allocated along with the feeders. The deviation in the maximum power point (MPP) in the grid-connected PV system occurrs due to the loss factors that are caused by the various variations in frequency-voltage, irradiance, DC load, and solar cell characteristics [8]. Indeed, the level of penetration of the PV system greatly affects the system losses. In [9], three tests have been applied to different IEEE systems (13,30 and 69 bus systems) with four different simulation cases. For reducing the losses, the PV system is placed at the bus that is containing the peak value rather than the average value. Implementing different PVs in the systems can contribute to increased energy losses and voltage fluctuation. In [10], the annual energy losses with variant generators are computed, and the impact of different types of DGs at energy losses is analyzed.
Several methods are used for power flow calculations in distribution systems, e.g. Newton Raphson and Gauss-Seidel methods, which are used for non-linear loss calculations [11]- [13]. A fast and accurate method for loss calculation in balanced distribution systems is discussed in [14], which is based on the machine learning techniques. The model is constructed using the Regression Tree technique for various generations of load profiles. Another machine learning method for state estimation of the system using neural methods is discussed in [15]. This method is applied to small-scale balanced distribution systems without renewable energy. In spite of the previous machine learning-based methods could overcome the computational burden of the iterative methods, they were applied to small systems.
Modern distribution systems require simulation algorithms for estimating energy losses with renewable energy sources, such as PV. In this paper, we propose a machine learning-based method for performing real-time simulations for unbalanced power distribution systems with PV. In our approach, the losses of the large scale systems have been calculated in a very short time and high accuracy using a neural network model. Unlike the existing iterative methods, the proposed method can deliver accurate results in a very short time. The proposed method is applied to a large-scale unbalanced distribution system (the IEEE 906 Bus European LV Test Feeder) with a PV gridconnected unit.
The rest of this paper is organized as follows. Section II explains the proposed methodology. Section III presents the results. Section IV concludes the paper and provides some lines of future work.

A. Data Structure and Preparation
In the proposed method, machine learning algorithms are utilized to model the relationship between the input and its corresponding output. In our case, the inputs are active and reactive power profiles of all loads and PVs for the three-phases in the per-unit scale for different time instants. On the other hand, the output of the machine learning algorithm is the power losses for all the branches of the distribution system. If we consider that the output is represented by a matrix PL ph as expressed in (3), each element in this matrix represents the total system losses at time instant t m for ph phase of the distribution system.
Note that a power flow tool is required to construct the output matrix PL ph based on the input P ph and Q ph matrices. Indeed, there are available tools that can be used for loss calculations in distribution systems. The OpenDSS supports all frequency domains that relate to the smart grids with renewable energy systems. To validate our approach, we employed OpenDSS software [16] as a benchmark.

B. Constructing A Machine Learning Model
Indeed, there are many machine learning algorithms that can be utilized to model the energy losses in distribution systems, for example, regression trees, Gaussian processes, logistic regression, support vector machines, and XGboost. In this paper, we use neural networks because it is simple to use, the availability of the neural network tools with graphical user interface (GUI), allowing reproducibility of the studied cases.
In short, neural networks could deal with complex systems, and so they are widely used in data modelling and statistical analysis [17]. The training process of the network is performed by adjusting the weights and biases until reaching the minimum threshold. The techniques of learning neural networks are based on minimizing errors between the output and the desired target. There are different learning techniques such as feed-forward backpropagation (where the errors are directed back to the network input until achieving the network goal) and cascade forward backpropagation.
Here, we describe the architecture of the neural network model while highlighting the way to train the model. For this purpose, Fig. 1 shows the utilized neural network model for the IEEE 906 Bus European LV Test Feeder. The model includes one hidden layer (10 neurons and one output layer). Note that the dimension of the input is 907, which represents the loads, and the output is the total system losses. We build the model using the feed-forward neural networks with initial parameters. The learning algorithm for training the network is the Levenberg Marqudarable (LM) algorithm [18], [19] that has high efficiency. In the training phase, the actual losses are compared with the model output until the network stopping goal is accomplished (maximum number of 300 epochs, minimum 5% gradient, 1e-3 goal error). If the set goal is not met, the weights and biases are updated with the learning machine rate until the error is equal or less than the set goal. We use the MATLAB Neural Network Toolbox to construct the neural network model. -159 - Fig. 2 shows the steps for loss estimation using a neural network model in distribution systems with PV. The model is constructed using a synthetic dataset (one-month dataset of loads and PV generation). Specifically, we generate the possible scenarios of loads (43200 load points with 15 minutes time step). Another option is to utilize the reliable forecasting models for PV proposed in [20], [21] to generate the datasets. The corresponding power losses of these datastes are computed offline using OpenDSS. The input (load factors and PV power) and output data (power losses) of the OpenDSS are fed into the neural network to construct the model. Once the training process is completed, the model would be ready for solving the power losses for any input data rapidly and accurately without iterative processes employed in state-of-the-art methods. For testing the proposed model, we use a one-day dataset (1440 samples) at six different resolutions. f) The output of the network is simulated and compared to the OpenDSS output for assessing the accuracy of the proposed method.

C. Solution Steps
g) Print the analytical and graphical results.

D. Evaluation Metrics
The efficacy of the proposed method is quantified by how close the estimated losses are to the exact ones calculated by OpenDSS. Here, different types of errors are computed: • The mean square error (MSE) in which the average of the squared difference between the estimated and actual values of the power loss is measured by the following formula: where and are the exact and estimated losses of phase ph at time instant t. L represents the number of time instants.
• The root means square error (RMSE), where the square root of the average of the squared values of the difference between the actual and the estimated values are calculated using the following formula: • The mean absolute error (MAE) where the average difference between the two methods is calculated using the following formula: • The mean absolute percentage error (MAPE) where the accuracy of the proposed method is expressed as a percentage defined by the following equation: • The sum of the squared error (SSE) which is the measure of the scale of variation between the two methods, given by the following equation: • The relative error (RE) which is provided by the following equation: All of these errors are calculated for the proposed method to evaluate its accuracy rate compared to the exact one. We have utilized different error formulae (4)- (9) to test the accuracy of the proposed method sufficiently.

A. Test System and Dataset
Here, the performance of the proposed method is tested for estimating the energy losses on the MATLAB environment. The results are implemented at Intel® Core™ i5-5200U CPU @ 2.20GHz, 4.00 GB RAM, and 64-bit Operating System. The IEEE European low voltage test feeder [22] with 907 bus and 50 Hz frequency (Fig. 3) is used to validate the accuracy and the computational efficiency of the proposed method. Two PV units are connected at busses 639 and 906, and 55 single-phase loads with different daily load shapes (1400 loads points per minute step) are distributed along with the system. To construct the offline loss model, we used a dataset of load and PV generation profiles that contains 43200 samples. For analyzing the performance of the proposed method, we have performed the following experiments: • The power loss is analyzed for six different time resolutions for phases A, B and C, separately. The estimated losses are compared to the exact power loss of OpenDSS software.
• For accuracy validation, MSE, RMSE, MAE, MAPE, and SSE are calculated for all the three-phases.
• To highlight the computational efficiency of the proposed method, the execution time of the proposed method for estimating losses is computed and compared to the execution time of the exact approach.

B. Performance Analysis
The performance of the proposed method is compared to the exact iterative time-series power flow approach (OpenDSS) for phases A, B, and C. Specifically, we estimate the power loss profile for a day with six different time resolutions (1min, 5min, 10min, 15min, 30min and 1hr). For example, the numbers of samples in datasets for the day per 1 min and 1 hr are 1440 samples (24*60) and 24, respectively. Fig. 4 shows the estimated losses at 1min and 1hr resolutions using the proposed model and OpenDSS. It is evident that the estimated losses during the day for the two resolutions almost match those of the exact method. Another notice is the loss profile of 1m resolutions differs from the 1hr resolutions, higher fluctuations appear in 1m resolutions than 1hr resolutions. This means that the higher resolution of datasets can sufficiently represent the actual loss profiles in which the PV and load profiles have intermittent nature. However, the computational burden of the existing iterative methods will be increased when higher PV and load datasets are required to be analyzed. To solve this issue, the proposed method can accurately calculate the losses with large datasets (high resolution) in a very short time, thanks to the developed offline model. Table I, Table II and Table III summarize the values of MSE, RMSE, MAE, MAPE, and SSE for phases A, B, and C, respectively at the six resolutions. It is obvious that the values of all errors are very small with respect to the exact solutions of the losses. Further, they are low at 1h resolution (coarse resolution) while they are high at 1m resolution (finest resolution). The lowest value of MSE appeared at 1h resolution for phase A (6.1190e-07) while the largest value appears at 1m resolution for phase B (3.2222e-04). The same trend is noticed for  -161 -MAE, MAPE, and SSE. For the three phases, the RE values are less than 0.05. Note that for this test system, the estimated results of phase A are more accurate than those of phase B and C, but this is not a general rule for distribution systems.

C. Computational Performance of the Proposed Method
For a further description of the contribution of the proposed machine learning-based method, the execution times for solving the losses during the day with six resolutions are computed for the proposed method and exact method (OpenDSS). Table IV shows the computational times of the two methods. The execution time required for the OpenDSS is very long compared to the proposed method. For 1hr resolution, the OpenDSS takes approximately 1.5 sec while the proposed method takes only 0.02 seconds for obtaining the results. The execution time of OpenDSS is greatly increased with the data resolution, for example, in the case of 1min resolution, the OpenDSS takes around 41 sec. However, our proposed method takes less than 0.04 sec.

D. Comparison
To demonstrate the performance of the NN model, we compare it with a support vector regression (SVR) model. Table V, Table VI and  Table VII show the MSE, RMSE, MAE, MAPE, SSE and RE values of the  SVR model for phases A, B, and C, respectively at the six resolutions. For phase A, the SVR model achieves MMSE of 0.0028 and 3.3389e-04 with 1m and 1h resolutions, receptively. With phase B, it gives MSE of 0.0031 and 3.8189e-04 with 1m and 1h resolution, respectively. In the case of phase C, the SVR model gives a MSE less than 7e-04 with all resolutions. For the three phases, the RE values are less than 0.23, which are much higher than those of the NN model (RE values < 0.05). In general, the comparison between the errors of the NN model shown in Tables I-III and the errors of the SVR model shown in Tables V-VII reveals that the NN model achieves prediction errors much lower than those of the SVR model. Therefore, it seems that the NN model is more suitable for this task.

IV. Conclusion
In this paper, an efficient method has been proposed for performing time-series simulations for unbalanced power distribution systems with PV. Unlike the related iterative methods, our proposed method is based on machine learning algorithms. The proposed method has been applied to the IEEE 906 Bus European LV Test Feeder with PV gridconnected units. The proposed method is validated using OpenDSS software. The test of the proposed method has been carried out at six different resolution times (1hr, 30min, 15min, 10min, 5min, and 1min). When comparing between the trained model and OpenDSS, the calculated results have a strong matching. The calculation time required by the OpenDSS for computing losses is too long compared to the proposed method, especially at the high resolution (i.e. 1 min). The experimental results show that the NN model outperforms the SVR model for time-series simulations. The results demonstrate the effectiveness of the proposed method.
The main goal of this work is to show that the application of machine learning is a promising approach to estimate energy loss in large-scale distribution systems. So, the contribution of this work is to demonstrate the applicability this approach while not trying to choose the most proper machine learning technique. Further, we believe that the results of an efficient machine learning technique can yield acceptable results for this application. The future work will be directed to consider diverse renewable energy sources, such as wind turbine generating systems.