Forecasting-Aided Monitoring for the Distribution System State Estimation

In this paper, an innovative approach based on an artiﬁcial neural network (ANN) load forecasting model to improve the distribution system state estimation accuracy is proposed. High-quality pseudomeasurements are produced by a neural model fed with both exogenous and historical load information and applied in a realistic measurement scenario. Aggregated active and reactive powers of small or medium enterprises and residential loads are simultaneously predicted by a one-step ahead forecast. The correlation between the forecasted real and reactive power errors is duly kept into account in the deﬁnition of the estimator together with the uncertainty of the overall measurement chain. The beneﬁcial eﬀects of the ANN-based pseudomeasurements on the quality of the state estimation are demonstrated by simulations carried out on a small medium-voltage distribution grid.

learning techniques both for load estimation and load forecasting problems.
Traditional load power estimation can be performed from typical load profiles of each user type. Assuming the demand at each hour is equal to the demand at the same hour of the previous equivalent day is a common choice. e load profiles can be considered as derived estimates of the customer load behaviour with high variance. Consequently, the quality of the state estimates obtained with this approach is generally poor. To improve the accuracy of load power consumption estimation in [11], a neural network approach is proposed to realign the average load profiles with the real power flow measurements available at the network substation. In particular, for each bus, two feed-forward artificial neural networks (ANNs) are trained, the first one associates real active power flow measurements with active power injections, whereas the second one relates real reactive power flow measurements with reactive power injections.
In the literature, algorithms based on Multiple Linear Regression (MLR) analysis [10] and machine learning techniques [12] have been proposed for load forecasting.
Load forecasting is a difficult task as the time evolution of the loads is complex and exhibits several levels of seasonality; the load at a given hour can be correlated not only with the load at the previous hour but also with the loads at the same hour on the previous day and with the same day in the previous week. Moreover, many important exogenous variables can be considered, especially weather-related variables. In this context, ANNs with their inherent capability to infer a function from data represent an efficient solution for load modelling more than linear models. Generating pseudomeasurements using ANNs for load forecasting has been demonstrated to be a viable solution especially for local-level load modelling [12]. In [13], a closed-loop estimator is proposed for a medium-voltage DS, where a machine learning function provides pseudomeasurements to DSSE. e output of the estimator is then fed back to the machine learning function, which allows the estimation when measurement data are missing. For each medium-voltage (MV) node, two feed-forward ANNs are trained to independently forecast nodal active and reactive power. Load time series and indices categorizing the load time series according to the load patterns are used as ANN inputs. e ANNs are retrained over time whenever a new load time series of an MV node is available. In [14], a SE tool with closed flow between the estimator and the machine learning function is proposed as in [13]. In [14], a feedforward ANN is trained to forecast active power value one step ahead with three types of inputs: historical load information and weather-related and time-related variables. Once the predictive model has been developed, its performance is monitored continuously in order to detect the deterioration over the medium-to-long term (i.e., weeks to months) and retrain it accordingly.
Among all the proposed machine learning methods, multilayer perceptron (MLP) neural networks demonstrated to perform better than others for load forecasting [11,15]. In particular, in [15], five machine learning methods, i.e., MLP, Support Vector Machines, Radial Basis Functions, Decision trees, and Gaussian Process, have been compared to forecast the active power charges for the next 100 hours. e comparison has been made using three measures of accuracy (MAPE-Mean Absolute Percentage Error; MAE-Mean Absolute Error; and RMSE-Root Mean Squared Error) showing that the MLP is the most robust among the others.
In this context, this paper proposes an MLP load forecasting model for generating simultaneously high-quality active and reactive power pseudomeasurements for an effective branch-current-based DSSE (BC-DSSE). e estimator uses all the available information suitably considering uncertainty sources and correlations. In particular, appropriate weights are introduced to take into account the SM and forecast uncertainties. A realistic measurement scenario composed of few real measurements, SMs, load forecasts, and suitable pseudomeasurements is assumed. In particular, a feed-forward ANN is trained to predict one step ahead the power demands at each MV node. Exogenous variables have been used as predictive model inputs together with historical load information. A closed-loop information, flowing from the ANN outputs to the inputs, has been created to allow the BC-DSSE even though SM measurements are not available at the MV node for the last 24 hours. e approach proposed in this paper is validated by means of simulation performed on a grid derived from a portion of a distribution network [16], which is a simplified version of an 18-bus UK radial feeder.

Distribution System State Estimation
DSSE is the key routine to obtain a picture of the network status at a given time instant.
e system is locally considered under steady-state conditions, and the underlying measurement model can be described as follows: where z is the vector of available measurements z 1 , . . . , z M ; x is the vector including all the variables that uniquely define the state of the network; and h(·) represents all the measurement functions (generally nonlinear) linking the reference measured values to the state of the system. Vector ϵ includes all the measurement errors and is considered a zero-mean random vector. e state x can have different formulations (the so-called voltage state or current state, either in polar or rectangular coordinates), which are equivalent from a theoretical perspective but can lead to advantages in the implementation of DSSE solution routines. Hereafter, the following state x is considered (see the BC-DSSE algorithm in [17]): x � V s I r 1 · · · I r N br I x 1 · · · I x N br , where V s is the voltage magnitude at a node of the network, which is chosen as the reference (e.g., the slack node), while I r k and I x k are the real and imaginary parts of k − th branch current, with the branch index ranging from 1 to the number of branches N br .
is branch-current formulation allows 2 Complexity linearizing some of the measurement functions and thus simplifying the estimation process. From (1), it is clear that, to estimate the network state, it is necessary to manipulate the information coming from measurements in z, also taking into account the characteristics of the measurement errors. As mentioned above, in DSs, it is hard to have a widespread installation of measurement devices and thus the availability of real-time measurements is typically limited. e measurement vector z can thus be divided as follows: where z real− time includes the M r real-time measurements, which can be voltage magnitude, current magnitude, and active or reactive power measurements as far as conventional measurements are concerned. Vector z pseudo represents information that can be derived from other sources (pseudomeasurements), mainly historical information on active and reactive power consumption or generation. Every load or aggregated load of the network is thus analysed to define its average power absorption or injection. Pseudomeasurements are necessary to allow system observability in this context, but their accuracy is usually very low. e most widespread technique to perform DSSE is represented by the Weighted Least Squares (WLS) approach, which aims at finding the state that minimizes the following objective function: which is the sum of the weighted squared residuals. Matrix W represents the weighting matrix that allows penalizing or favouring each residual in a different way depending on the accuracy of the corresponding measurement or pseudomeasurement. W is usually chosen as the inverse of the variance-covariance matrix of the measurements Σ z so that all measurements are weighted differently depending on their uncertainty (corresponding to a maximum likelihood estimation in case of normally distributed measurements), which is typically block-diagonal, since measurements performed by different devices can be considered uncorrelated. e minimization in equation (4) is typically obtained by an iterative Newton solution of the following system of normal equations (considering the generic iteration i) [9]: where Δx i � x i+1 − x i is the estimated state variation and is the Jacobian of the measurement functions in h with respect to the state variables computed at the previously estimated state. e so-called gain matrix G i can be written as and it is constant when measurement functions are linear or can be linearized (see [17] for examples and details).
Pseudomeasurements are employed similarly to other measurements but are usually associated with very large uncertainties (e.g., derived from historical variability of loads), which result in small contributions to the objective function in equation (4). Focusing on active power, a typical approach is to consider the following quantities for an unmonitored network node j: where z P j and σ z P j indicate the active power pseudomeasurement and its standard deviation, respectively, P j (nT s ) is the active power injected (the injection convention is used for both absorbed and generated power) into node j, and T s defines the time resolution of the available information (historical data). e set Γ includes the considered time instant indices (with n indicating the generic time instant) for all the available power samples, and K � |Γ| − 1 is its cardinality decreased by one to obtain the classical unbiased variance estimator. e higher the variability of the load/generator, the larger the corresponding variance, which is inversely proportional to the weight w z P j associated with the pseudomeasurement. A typical variation range of the power drawn by a load is over 50% of its nominal value, so it is easy to see that pseudomeasurements, while guaranteeing observability, are often of little help in improving the estimation accuracy.
In this context, SMs, and, in particular, those of 2 nd generation, can play a significant role in enhancing pseudomeasurement definition. New SMs can provide voltage magnitude and active and reactive power measurements, with a much faster reporting rate than before. Italian authority for electric energy, for instance, gives and continuously updates directives on the functionalities for newgeneration SMs, which include, among others, a 2 s measurement interval for "instantaneous" power measurements [18]. ese measurements might be, in the future, directly acquired and integrated into estimation algorithms, but due to the huge number of installed SMs (above 30 million in Italy), it would be difficult to directly manage them in realtime and investment costs for communication and computation could easily become overwhelming.
For this reason, it is much more likely that SM measurements are used in an indirect and delayed way. e approach proposed in this paper is to collect active and reactive power measurements from SMs and exploit them to forecast the power consumption at a given time with anticipation compatible with the timing and data collection requirements. To this purpose, load forecasting Artificial Neural Network (ANN) models have been trained for different types of loads and/or aggregation of loads, as detailed in the following section. In fact, SM measurements from large customers or from a set of users connected to a given node (for instance, in a medium-voltage network) can be gathered from the field one day or few hours ahead the time instant of BC-DSSE execution and aggregated, so that they can serve as inputs in the neural load forecasting process.

Complexity
As previously mentioned, once the forecast power consumption or generation of a given node is obtained, it can be included in z as an enhanced pseudomeasurement and it is thus important to associate the correct covariance matrix to the new forecast quantities, thus allowing a correct weighting of the corresponding residual in the WLS procedure, according to equation (4).
Focusing on a generic bus j, the forecast procedure gives two predicted quantities P F j and Q F j , which are the active and reactive power injections, respectively. e following covariance matrix is thus needed: where σ 2 covariance between the pseudomeasurement errors of the two forecast powers and ρ PQ j is the corresponding correlation coefficient. It is interesting to notice that, while in conventional DSSE approaches, the active and reactive powers pseudomeasurements are usually considered as uncorrelated, in the proposed approach, further information is taken into account and the correlation arising in the simultaneous estimation of P j and Q j can be easily included in the estimator by using the submatrix in the overall weighting matrix. Another important aspect, which is usually overlooked in the literature, is the modelling of the SM uncertainty. e proposed load forecasting is designed to obtain an estimation of the measured P j and Q j at a given time instant nT s from previous available measurements, but this means that the computed power values can be considered only as approximations of the measured values (reference values are obviously unknown in practical conditions and real-time operation). e SM measurement chain is an additional source of uncertainty that affects the values considered in equation (8). As an example, the calibration process of SM devices cannot be perfect and compensate for all the systematic errors in all the operating conditions.
For this reason, in the following, the definition of the weights is discussed in detail in the presence of both forecast and SM errors. Focusing on z P j and z Q j , it is possible to distinguish the two zero-mean error contributions as follows: where P ref j and Q ref j are the ideal reference values of active and reactive power, e F P j and e F Q j are the corresponding forecast errors, and e SM P j and e SM Q j are the errors associated with the aggregated SM outputs. In the following, in the absence of further information, all the errors of the SMs associated with loads or generators grouped under the jth node are considered independent. Similarly, the active and reactive power measurement errors are assumed uncorrelated.
Given the relative standard uncertainty α SM of the generic SM (as derived from the SM datasheets and assumed as common to all SMs, without loss of generality), the relative standard uncertainty associated with e SM P j becomes where P SM i is the measured active power of load i, which belongs to the set Λ j of the loads downstream the MV node j. Since the lack of knowledge in SM behaviour can be considered as independent from the prediction errors, the overall variance of the measurement z P j can be expressed as follows: Similar expressions are valid for the reactive power while the correlation coefficient ρ PQ j , under the above assumptions, becomes as follows: where ρ F PQ j is the correlation coefficient of the active and reactive power forecast errors. e way these values are computed in practice is explained in the following section, where information available at each step is discussed. e above equations allow the computation of the elements of Σ z PQ j (see equation (8)) and thus, the definition of the weights for BC-DSSE that reflect appropriately the actual uncertainty in the proposed pseudomeasurement model.

Load Forecasting Neural Network Model
A Multilayer Perceptron (MLP) ANN with one hidden layer is used to forecast the load demand one step ahead (roughly speaking "t + 1 prediction," where the actual time step depends on the chosen ANN model in terms of input and output variables), where the time interval for prediction update is assumed equal to half an hour. In particular, residential and Small and Medium Enterprises (SMEs) loads have been considered in this paper. An MLP has been trained for each single or aggregated load. It has to be noted that, while for the residential loads, only the active power has been forecasted, as in the majority of the literature, for the SME loads, the corresponding reactive power is also considered in the proposed model. Figure 1 shows the structure of the MLP neural network for a SME.
4 Complexity e relationship between input and output patterns is described by the following algebraic equations system: input layer hidden layer output layer where v is the input vector, which contains variables related to the time instant t: At time instant nT s that corresponds to the current time tag of DSSE computation update, information on active and/ or reactive powers of node j is needed. In the proposed solution, if the ANN forecast model is available for node j, powers P j and/or Q j are obtained, thanks to the performed prediction. In this case, the instant indicated as "t + 1" in the forecast model description (see Figure 1) corresponds to the current time instant nT s . For ease of presentation, a perfect match has been here adopted between T s and the time interval of prediction update (half-hour step for both prediction and estimation updates), but other solutions at different rates are also possible following a similar scheme. As mentioned above, an ANN model is built for all the loads of interest and its outputs (load predicted powers) are fed into the DSSE algorithm at the following time step, with the procedure described in the previous section. W 1 is the weight matrix of the input layer, b 1 is the bias vector of the input layer, y is the input of the hidden layer, h l is the output of the hidden layer, f(·) is the hidden neuron (logistic) activation function, W 2 is the weight matrix of the output layer, and b 2 is the bias vector of the output layer.
e Levenberg-Marquardt algorithm [19], which combines the gradient descent method and the Gauss-Newton method, has been used for the MLP training. e hyperbolic tangent sigmoid transfer function is used in the hidden layer. e inputs and outputs are normalized in the range [− 1, 1] before being used to train the ANN to balance the importance of input variables. In the following, the case study and the database used to train and test the forecasting models are described.

Case Study
To evaluate the performance of the proposed approach, several tests have been carried out on a single-phase 18-bus network derived from a UK network (Figure 2) [20]. Connected to the 33 kV at bus 1, the network has a common rated bus voltage level at 11 kV. is network is used since it is adopted for other studies in the literature and gives a realistic load scenario for a MV network. On this topology, both industrial and residential loads have been considered as explained in detail in the next section. ere is no loss of generality in considering loads information (SM measurements) coming from the database described in the next section, since individual loads are aggregated to replicate a load scenario that is compatible with nominal data of the considered network in [20]. Main assumptions and test results are also reported and discussed in the following.

Database for the Load Forecasting.
To train and test the neural networks used to perform the load forecasting, data related to the active power consumption, available from the Commission for Energy Regulation (CER) [21], have been used. is database is anonym, and it consists of recorded half-hourly SM energy consumption from 6445 customers that participated in the "Electricity Smart Metering Customer Behavior Trials" [22]. e data are collected over a period of 18 months (from July 14, 2009, to December 31, 2010), at various distribution network locations in Ireland.
e customer types are classified as residential (4225), SME (485), and others (1735). e present paper focuses on residential (the largest group in the available database) and SME customers who completed the trial. e SME customers are grouped into four subsectors: entertainment (including hotels, restaurants, sporting facilities, and public houses), industrial manufacturing, offices, and retail premises.
After removing the consumers having missing data, a database of 3423 residential and 287 SME consumers has been obtained. For SME loads, the reactive power profiles have been obtained starting from a real power factor (PF) profile recorded on a typical industrial site. In particular, the recorded PF profile has been propagated on the whole observation period. No reactive power was taken into account for residential loads.
Analyzing residential and SME data, a significant difference between the two consumer types can be observed. Figure 3 shows the power consumption (P, where the node index is dropped in the following when unnecessary, for the sake of simplicity) of four residential loads, randomly selected from the database. As can be noted, the energy consumption of each household is low, not regular, and very different in two consecutive working days as it depends on the lifestyle of its residents. On the contrary, the SME loads    Figure 1: Structure of the active (P) and reactive (Q) power one-step ahead forecasting neural network model for SMEs (the structure of the neural network for residential loads considers only active power both in the input and the output layers).
6 Complexity ( Figure 4) appear typically high and mostly regular due to the regular activity during the working hours and working days. erefore, a different load forecasting performance for these two subsets is expected.
Since the objective of this paper is an effective state estimation of a DS, different equivalent MV loads have been created aggregating loads from the database with the aim to obtain power levels compatible with those of the considered grid ( Figure 2). e load aggregation has been performed simply summing together the readings of several SMs. In particular, three different SME loads (sme 1 , sme 2 , and sme 3 ) are obtained by aggregating 21, 26, and 46 individual SMEs data, respectively. Moreover, two different residential loads (res 1 and res 2 ) are obtained by aggregating, respectively, the energy consumption of 523 and 537 residential loads randomly chosen. In Table 1, the ranges of the active power of the aggregate loads are reported. Figure 5 shows the power consumption of one aggregated residential (res 1 ) and one aggregated SME (sme 3 ) load in the same two consecutive days considered in Figures 3 and 4. As expected, the aggregation makes the load profile more predictable. In fact, especially in the case of residential loads ( Figure 5(a)), the agregate load is more periodic and smoother than individual ones. is is because the aggregation operation permits to remove the high-frequency impulses corresponding to random events in the individual curve, alleviating and smoothing the randomness. Regarding the aggregated SME load ( Figure 5(b)), its periodicity is more evident because the individual SME loads are already more periodic than individual residential loads. e demand profiles depend not only on historical load evolution but also on exogenous variables, such as season and weather-related variables. erefore, weather data, collected from the Irish Meteorological Service [23], have been added to the database. Among the weather variables, in this paper, the temperature (in°C) and the humidity (in percentage) have been chosen as neural model inputs. is is because when the value of the temperature varies, the power system demand also varies. Furthermore, the humidity plays a relevant role in driving electricity demand during the warm months. In fact, the temperature above certain values is intensified by high humidity [24]. As the aggregation has been performed by selecting individual loads located in different Irish areas, the simple averages of the temperature and the humidity percentage measured by several weather stations located in the central area of Ireland are used to represent the corresponding weather variables.
To highlight the dependence of the load energy consumption on the weather data, Figure 6 reports the daily energy consumption of res 1 and sme 3 and the average daily temperature trend over the same period. As can be noted, both residential and SME aggregated loads show a time pattern dependent on the temperature, with a stronger dependence for the residential load. Obviously, the dependence of the load energy consumption on the temperature data, even if prevailing, is not the only one. In fact, the drastic reduction in the consumption of SME loads and the increase in the residential one at the end of December 2009 are mainly related to the Christmas-New Year period rather than the temperature. In this paper, the dependence of the load energy consumption to the weather data is demonstrated by evaluating how the performance of the forecasting model is affected when the weather variables are excluded from the inputs of the model. e results are shown in the following section. Other weather variables, such as precipitation and wind speed, were analyzed, but it was found that, in this case, they do not have a significant impact on the energy demand.
As Figure 6 highlights, the considered energy consumption time series shows strong regularity, and a spectrum analysis revealed a prevalent daily periodicity, but at the same time, it is decidedly nonlinear. is complex data behavior can be captured by a MLP, which owns the ability to construct a larger set of nonlinear input/output mapping, by combining an appropriate number of nonlinear activation functions. erefore, in this case study, a simple MLP, with a suitable structure, can be enough to build a performing forecasting model avoiding the overfitting of the training data.

Performance of the Load Forecasting Models.
A neural load forecasting model has been designed for each aggregated load characterized by the range reported in Table 1. In order   8 Complexity to optimize the load forecasting network architecture, a trialand-error approach has been performed to choose the appropriate number of hidden layer nodes, which consists in progressively growing the number of nodes, and selecting the network that minimizes the prediction error on the validation set. is optimization procedure resulted in 20 neurons for all the five networks (associated with loads in Table 1). erefore, the best MLP architecture consists of an input layer with one neuron for each input variable (thus 9 or 13, for residential or SME, respectively), one hidden layer with 20 neurons, and an output layer with one neuron for each output variable (1 or 2 for residential or SME, respectively). us, the dimensions of the weight matrices and bias vectors in equation (14) result in 20 × 9 for W 1 , 20 × 1 for b 1 , 1 × 20 for W 2 , and 1 × 1 for b 2 in case of residential loads. In case of SME loads, W 1 and b 1 are 20 × 13 and 20 × 1, respectively, whereas W 2 and b 2 are 2 × 20 and 2 × 1, respectively. e time series of each load profile is composed of 25728 half-hourly active and reactive (when considered) power values, from August 1, 2009, to December 31, 2010, while the July 2009 data were not used as only the recordings of fifteen days were available. e MLP training has been performed using the data of the first 12 months. e validation has been performed using the following 2 months (from August 1, 2010, to September 30, 2010). e last 3 months (from October 1, 2010, to December 31, 2010) have been used to test the trained neural model.
Since the forecasting accuracy depends both on the quality and quantity of the historical data used to train the predictor, a greater amount of data, for example, an extra year would certainly improve the prediction performance.
Note that a realistic assumption about the monitoring architecture could be that data from SMs are actually available after 24 hours. However, the proposed solution can also work with data collected one hour before (that could represent a future-proof scenario). us, during the training of the neural model, this information, both for active and reactive powers, has been included among inputs, because it is always available in the offline phase. In the online test phase, the corresponding inputs have been replaced by the values forecasted by the neural predictor at the previous steps. e outputs of the predictor are then fed back to the input layer creating a closed-loop information flow. is allows the state estimation even when measurement data are missing.
To evaluate the performance of the predictive models, the MAE, the MAPE, and the Root Mean Square Percentage Error (RMSPE), defined as in the following, have been used: where o i is the actual load value, which can thus represent either the active power P or the reactive power Q of the considered load (respectively, P j and Q j when referring to the network nodes as in the DSSE section above); o i is the corresponding predicted load value; and n is the number of training or testing samples. e smaller the values of MAE, MAPE, and RMSPE are, the better the forecasting performance is. Figure 7 shows (in the top) the actual (black line) and the predicted (red line) active power load time series and (in the bottom) the corresponding differences between predicted and actual load powers for a month (October) of the test set related to sme 1 . Figure 8 reports the same time series for res 1 . As can be noted, the trends of the two loads are efficiently modeled by the neural predictors. Figures 9 and 10 report the behavior of the actual and predicted real power load time series (top) and the corresponding prediction error (bottom), for the sme 1 and res 1 respectively, during the first test week. e MAPE for the forecasted active power in this time window results in 4.7% for the sme 1 and 4.8% for res 1 . Moreover, the validity of the zero-mean hypothesis has been verified for errors of both active and reactive power prediction.
In Table 2 and in Table 3, the training and test performances, obtained for the SME and residential aggregate loads, are reported, respectively.
It can be noted that, as expected, the performance deteriorates in the test phase. e SME loads show that the error percentages of the active and reactive power are very similar. e correlation coefficients between the real and reactive power errors are then evaluated to be used in the following, for the state estimation.
e results on the test set show that the proposed predictive model is able to forecast simultaneously both active and reactive powers (when required) with limited errors, starting from both exogenous and historical measurements. Moreover, it overcomes the problem of limited or time-delayed historical measurement availability throughout a closed loop information flow, which replaces the missing data with values forecasted by the predictor itself at the previous step. e influence of the input variables on the load energy consumption can be evaluated through the performance of the forecasting model. In fact, the performance of the neural network model is expected to deteriorate when an effective variable is excluded from the inputs. In this paper, since aggregated loads show a time pattern dependent on the temperature, highlighted in Figure 6, the dependence of the load energy consumption on the weather data has been assessed. Removing the weather variables, the RMSPE on the test set increases by about 2%, for both active and reactive powers for SME costumers, and by about 1% for the residential ones (which are significant variations with respect to the results reported in Tables 2 and 3).

Performance of the Distribution Systems State Estimation
To assess the estimator performance, several simulations have been carried out starting from a measurement scenario that is realistic for a distribution grid. Two measurement points have been assumed on the network: on bus 1, with a magnitude voltage measurement and an active and reactive power flow measurement; on bus 4, with a magnitude voltage measurement. SMs have been considered providing data fully available the day after the measurements. An accuracy equal to 1% for the magnitude of the voltage and equal to 3% for the power flows and SM measurements have been assumed. A Monte Carlo approach has been applied in order to obtain statistically sound results, and the following assumptions are made: (i) Number of Monte Carlo trials, N MC � 1000 (ii) A maximum deviation of 50% with respect to the nominal values for the active and reactive powers drawn by the loads (uniform distribution) (iii) Measurement errors uniformly distributed e SME loads sme 1 , sme 2 , and sme 3 have been connected to buses 17, 14, and 7, respectively (and thus are described by the couples P 17 − Q 17 , P 14 − Q 14 , and P 7 − Q 7 ), while residential loads res 1 and res 2 have been associated with buses 3 and 12 (and thus associated with P 3 and P 12 ). As for these loads, the last 1000 values of the test set have been considered, which correspond to a temporal interval of about 21 days. For each instant, a different operating condition of the network is thus considered by using such values for SME and residential loads and extracting the reference values applying the SM uncertainty. For all the other loads, active and reactive powers are extracted from nominal values according to the above assumption. en, all reference values are computed from these load conditions by means of load flow calculation. Finally, measurements are also extracted from their random distribution and used as inputs to the BC-DSSE.
To assess the performance of the estimator, two different formulations and configurations of the BC-DSSE that correspond to different computation and management of the pseudomeasurements have been adopted. e first one, which uses the proposed estimator, exploits the predictions of the loads coming from the corresponding neural load forecasting models. is case is indicated as "Prediction" in Figures 11-13 reporting DSSE results in the following.
As mentioned in the previous section, the forecast active and reactive powers of SME loads (P F j and Q F j , with j ∈ 7, 14, 17 { }) have been used for each instant together with the forecast active power of residential loads (P F j with j ∈ 3, 12 { }). To build the weighting matrix, besides the real-time measurement weights, submatrices concerning forecast loads must be included (see equations (9) and (10)) in the BC-DSSE. According to equation (12) (and its counterpart for reactive power), the variance of the pseudomeasurement is computed by using the RMSPE of the training set for the forecast errors and the datasheet information for the SMs. Since the SM measurements are not available in real time, the relative uncertainty of the aggregated power is evaluated for each time instant as the relative uncertainty of the day before at the same hour and it is associated with the aggregated forecast powers at the current instant. e above procedure has been applied for all the estimations. It is interesting to notice that, with this model, RMSPE can also be updated at fixed intervals by considering the measurements and forecast data obtained in the meanwhile.   Figure 7: Test set behavior related to sme 1 : (a) the red curve represents the predicted real power values and black curve represents the actual one; (b) the differences between predicted and actual load powers. e second formulation corresponds to the classical BC-DSSE where no forecasting is considered, and pseudomeasurements of nodes 3, 7, 12, 14, and 17 are directly computed from the available measurement data. In particular, the measured power values collected the day before at the same hour are used as pseudomeasurements.
is estimator thus does not apply predictions (and is referred to as "No prediction" algorithm) and is considered as a benchmark for the proposed method in the same network scenarios.   Once the state variables (branch currents) are estimated along with derived quantities (e.g., voltages and power flows), a comparison of the results obtained with the two methods is performed in terms of percent root mean square errors (RMSEs) of the estimations (i.e., the square root of the mean of the squared differences between the estimated quantities and the corresponding reference values). RMSE results of the branch-current magnitude estimations are presented in Figure 11. e bar plot in red (dash line) shows the results obtained considering the prediction of the loads, while the bar plot in grey (the same holds for the Figures 12  and 13) presents the results obtained considering the abovedescribed pseudomeasurements. A reduction close to 12% (meaning that the error is about halved) has been obtained as a best case on branch 6, where the largest forecast load sme 3 is connected, and an average reduction of more than 3% is also obtained. It is clear that the reductions in the estimation errors are more evident close to the position of the forecast loads. Branches 12 and 13 clearly show the same accuracy results, since node 13 is a zero-injection node. e same holds for the pairs 7-8 and 14-15. Figure 12 shows the results obtained in terms of percent RMSE of the active power flow estimations for all the network branches. e bar plot in violet (dash line) shows   the results obtained considering the prediction. e considerations that can be drawn by these results are similar to those obtained for the branch currents: estimation improvements are more evident for the branches that are close to larger predicted loads. e error reduction is also more effective when lateral branches or leaves of the network are considered. In this case, the proposed algorithm brings a maximum reduction about 11.5% (error reduction of about 44%). Moreover, an average reduction of more than 3.3% is obtained.
As for reactive power estimations, it is possible to see in Figure 13 that the estimations are mainly affected by the prediction of the industrial loads locally, since the reactive power forecast is also available for them. A reduction of the percent RMSEs larger than 13% is obtained at branches 12, 13, and 16, while it is larger than 15% (the estimation error is more than halved in this case) for branch 6. e test result highlights how the distribution state estimation performance significantly improves, introducing as pseudo-measurements the active and reactive power forecasted by the neural predictors instead of the power consumptions measured at the same hour of the day before. e improvements are more evident for branches close to larger loads.

Conclusions
Neural network load forecasting models demonstrated to produce reliable input information for a distribution state estimator, overcoming the problem of limited and time-delayed SM measurements or temporary failure in the communication system. In order to improve the accuracy of the state estimation, different requirements have been fulfilled: (i) the neural models are able to forecast simultaneously both active and reactive powers with limited errors, starting from both exogenous and historical measurements; (ii) the correlation between the forecasted real and reactive power errors has been determined, which results in significant information for the state estimation algorithm; (iii) a closed loop information flow allows the load forecasting, and hence, the state estimation, even when real measurement data are missing by replacing them with forecasted values; (iv) to build effectively the weighting matrix, needed to solve the state estimation algorithm, the variance of the pseudomeasurements can be updated at fixed intervals by considering the measurements and forecast data obtained in the meanwhile.
e test results show that introducing pseudomeasurements forecasted by the neural predictors significantly improves the DSSE and, more importantly, the improvements are more evident for the branches that are close to larger predicted loads almost halving the percent RMSEs of power and current estimations.
In summary, the proposed approach can be used for the state estimation of medium-voltage distribution networks that are either underdetermined, due to limited real-time measurements, or overdetermined but with delayed measurements from SMs.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.