A data-driven measurement placement to evaluate the well-being of distribution systems operation

The widespread integration of intelligent electronic devices has facilitated the employment of data mining methods in evaluating the operating condition of distribution systems. This possibility comes to prominence in active networks, where distributed energy resources can cause unforeseen dynamics that requires an effective monitoring infrastructure and a fast-track procedure to convey the system operating condition in a comprehensible manner to the operator. To this end, a data-driven approach is proposed to assess the status of system operating constraints by presenting each constraint as a classiﬁcation problem. Afterwards, by exploiting the propounded presentation of the system operating condition, the measurement placement problem in distribution systems is addressed as selecting a set of features that have the most contribution to evaluating the system operating status . To do so, ﬁrst, the effectiveness of the measurement units is identiﬁed through their contribution to the classiﬁcation process, and then a procedure is proposed to pinpoint the measurement units with redundant information. Monte–Carlo simulations are performed to provide a comprehensive training set. Receiver operating characteristic analysis and time-series power ﬂows demonstrate the effectiveness of the proposed approaches.


INTRODUCTION
To ensure a secure operation, the operating condition of a distribution system should be monitored regularly. Generally, state estimation (SE) techniques are employed for this purpose, by defining a set of variables as the system states and minimizing the variance in the estimation error [1]. While these techniques have been successful in transmission systems, in practice, they have not been widely applied in distribution management systems due to the lack of sufficient measurements [2]. Nowadays, with the widespread integration of intelligent electronic devices in distribution systems and the development of the smart grid, more real-time measurements are available [2], which enables the distribution system operator to benefit from a wide variety of data mining methods in evaluating the well-being of the system. With the employment of these methods, it is possible to provide intelligent but yet fast-track models to evaluate and convey the system operating condition in a comprehensible manner. This advantage is especially beneficial in active networks, considering that the integration of distributed energy resources (DERs) may cause unforeseen dynamics [3] that requires prompt responses from the operator [4]. Since the secure operating conditions of a distribution system are predefined in the grid code, among the various data mining methods, classification techniques are especially of interest.
Up to now, the application of classification techniques has been widely examined to identify the operating condition of transmission systems, either by mapping this issue onto a classification problem, or employing the classification techniques to model the system security rules. On this subject, discriminant analysis (DA), core vector machine, and support vector machine techniques have been used to recognize the insecure operation of a power system from the transient stability viewpoint in [5][6][7], respectively, while in [8], the performance of different classification techniques, i.e. decision tree, ensemble decision tree, and support vector machine, have been compared for this purpose. The time-series shapelet method, combined with the decision tree technique, has been used to assess the short-term voltage stability in [9]. In [10], DA together with a neural network, has been examined to evaluate the system security. In [11] and [12], the decision tree technique has been employed to model the power system security rules, while in [13], deep auto-encoders have been deployed to address this problem.
Unlike transmission networks, in distribution systems, the application of data mining methods in evaluating the system operating condition has been mostly confined to increasing the accuracy of the SE, either by providing a better estimation of the loads [14][15][16], or finding a better initialization point [17]. Only a few studies have been conducted to employ these methods to directly address the condition assessment issue. As an example, in [18], a neural network has been used to directly estimate the voltage profile of the medium voltage (MV) and low voltage (LV) sections, or in [19], six distinct states have been defined for the system and the hierarchical classification technique has been employed to distinguish among these states.
In addition to the evaluation procedure, an effective monitoring infrastructure is necessary to provide the necessary data required to assess the system operating condition. These data could be from the smart devices such as DERs inverters, or the voltage/current/flow measurement units connected by means of voltage/current transformers to the grid. These data could also be the consumption information at the intelligent substations, provided by aggregating the information of the smart meters at the LV level. Here, they are referred to as measurement units. In this regard, introducing an effective measurement placement scheme has been discussed in the literature to provide the optimal location and number of measurement units, together with the type of measurements [20]. Baran et al. [21] are of the pioneers who addressed this issue in distribution systems by proposing a set of rules to increase the SE accuracy. In [22], a robust optimization algorithm is deployed to minimize the total estimation variance of nodal voltages in different network topologies. A Fisher information-based approach has been suggested in [23], to solve the optimal measurement placement problem. In [24], flow measurements are placed to increase the accuracy of the branch current estimation. The optimal measurement placement has been settled in [20,25], and [26], by translating this problem to bounding the SE uncertainty.
As noted, in most of the existing studies, the optimal measurement placement problem in distribution systems has been addressed as minimizing or confining the SE uncertainty. When data mining approaches are employed to directly evaluate the system operating condition, this problem can be presented as selecting the minimum set of system features that grants an adequate accuracy for the data mining approach, which lays the base for the method adopted here.
In this study, evaluating the distribution system operating condition is addressed by presenting the system operating constraints as a set of binary classification problems. The measurements throughout the network are utilized as the features to differentiate between the binary classes, and DA is applied as the classifier. In this way, a comprehensible measure is provided for the operator to quantify the well-being of the system oper-ation and identify the susceptible areas in the network. Additionally, based on the propounded condition evaluation technique, a measurement placement plan for distribution systems is proposed involving two steps: singling out the effective measurement units, based on their contribution to the classification process; and pinpointing the units containing redundant information, by deploying the mutual information (MI) and differential entropy (DE) concepts. The main contributions of this work include the following: • The proposed approach for evaluating the distribution system operating condition provides a simple evaluation process that requires a much lower processing time, compared to the conventional SE approaches. This enables it to be employed in real-time applications. • This approach can provide higher accuracy than the conventional SE approaches, as it dedicates all the resources only to find the status of important states. • This approach is applicable for balanced and unbalanced, MV or LV networks and can treat any system operating constraint. • Since the contribution of each measurement unit is evaluated regardless of its type or configuration, these units could be of any type. This includes single-or three-phase configurations, as well. • The proposed measurement placement scheme delivers optional solutions, meaning that it provides the accuracy of the estimation that is achievable by each combination of measurement units. This enables to asses the trade-off between the quality of the SE and the investment it requires.
This paper is organized as follows. Section 2 elaborates the proposed method for evaluating the distribution system operating condition and also formulates the problem of measurement placement. Section 3 discusses the results of applying the propounded approaches to the IEEE 123 node test feeder. Section 4 concludes the paper.

METHODOLOGY
Different constraints are established to ensure that the distribution system is operating securely, e.g. over-or undervoltage limits for nodes, thermal limits for components, reverse power flows for lines etc. In this section, assessing the status of each system constraint (or a combination of system constraints) is presented as a binary classification problem with the first class representing the violation of that constraint, and the second class standing for the normal operation of the system, regarding that constraint. Measurements throughout the network are considered as the system features to be utilized by the classifier. In the offline phase, different operating points of the system are simulated to train the classifier in recognizing the pattern leading to the realization of each class. This classifier then could be employed in real time to assess the status of system constraints, based on online measurements.

DA to evaluate system operating condition
Among various classification methods, DA is an easy-toimplement technique that has already shown its applicability in settling different issues in power system studies, from load forecasting [27], load modelling [28], and fault detection [29] to assessing the power system security [5,10]. DA describes each class with a unique multivariate normal distribution and employs the properties of these distributions to distinguish among the classes. Although normal distribution is presumed, DA is robust against the violation of this presumption [30], especially when system features are continuous and bounded [31].
Let X = (x 1 j ) contain the values of n measurements in the network. Despite not being investigated in this study, these measurements could include pseudo-measurements (historical data) too. Denote f 1r (X ) and f 2r (X ) the probability density functions of the instances in which the rth system constraint (or any of a set of constraints) is violated and the instances in which the system is operating under normal condition, regarding that constraint, respectively. Together, they are denoted as f kr (X ), k = 1, 2. DA assumes a multivariate normal distribution,  n (M kr , Σ kr ), for each class with the probability density function of where M kr is the mean and Σ kr is the covariance of the kth normal distribution, for the rth system constraint. Considering (1), according to the Bayes theorem, the probability of X belonging to the kth class for the rth system constraint (p kr (X )) is given by [32] where kr denotes the prior probability of associating with the kth class. In DA, the class with the largest probability is selected in order to minimize the expected number of misclassification [32]. If sufficient measurements are available, the proposed approach can be employed as described. In the event that the measurements are deficient, the proposed presentation of the distribution system observability can be used to settle the optimal measurement placement problem. To do so, the measurements from all the nominated units are utilized to form the classifier and then the optimal combination of measurement units is achieved by discarding the ineffective and redundant measurements.

Assessing effectiveness of measurement units
The effectiveness of each unit is assessed based on its contribution to the classification problem. To this end, one effective way is to investigate how the variations of each measurement influence the probability of belonging to each class. A criterion that well demonstrates this sensitivity is the magnitude of the derivative of that probability with respect to that measurement (since p 1r + p 2r = 1, we have | p 1r ∕ x| = | p 2r ∕ x| for any r and x, and therefore, it does not concern which class is chosen. Here, we consider | p 1r ∕ x|). This criterion demonstrates to what extent that measurement (x) can change the status of that constraint (r).
As some units may contain multiple measurements, the effectiveness of each unit, i.e. Y = [y 1 , … , y c ], can be represented by the gradient of p 1r with respect to its contained measurements as where ∇ denotes the gradient operator. It is shown in the Appendix section that ∇p 1r can be calculated using (1) and (2) as Since ∇p 1r is a vector and also it depends on the system operating point, the mean of the 2-norm of ∇p 1r over all the data points in the training set, denoted by ||∇p 1r || 2 , is employed as the benchmark to assess the effectiveness of each unit regarding each constraint.
While identified as effective, some units may contain redundant information regarding the classification process. In theory, an increase in the information redundancy could decrease the misclassification rate, but it should be investigated whether this is economically reasonable. In this regard, following, the concepts of MI and DE are exploited to settle the trade-off between minimizing the number of measurement units and minimizing the evaluation error, by relating the amount of information redundancy in each unit to the amount of accuracy it provides. This relationship is employed to pinpoint the minimum set of units which grants a requested accuracy.

2.3
Redundancy and the concept of differential entropy Suppose measurements are divided into groups a and b: group b includes the measurements form a specific unit (or a specific group of units), to be discarded, and group a includes the remainder measurements. In compliance, X in (1) is divided into X a and X b . To measure the redundancy between the measurements of groups a and b, one effective index is the MI, defined as [33] where I (X a ; X b ) denotes the MI between X a and X b , and f X a , f X b , and f (X a ,X b ) are the probability density functions of X a , X b , and their joint probability density function, respectively. The concept of MI is linked to the DE of random variables.
The DE represents the uncertainty in a continuous random variable and is given by [34] where h(X ) and S denote the DE of X and its supporting set.
Considering (5) and (6), it can be concluded that [33] where h(X a , X b ) is the entropy in the joint probability density function of X a and X b . According to (7), to maximize the MI between X a and the remainder measurements, X b should be chosen to maximize h(X a ) + h(X b ). For an n-variate normal distribution,  n (M, Σ), DE is given by [34] h(X ) = n∕2 + n∕2 ln(2 ) + ln(|Σ|)∕2 ( 8 ) where |Σ| represents the determinant of Σ. Considering (7) and (8) together, we have where Σ a and Σ b denote the covariance matrices associated with X a and X b , respectively. Σ a is given by omitting the rows and columns of Σ corresponding to X b . In addition to having a large MI between X a and X b , the redundant measurements, X b , should be pinpointed such that the remainders, X a , preserve most of the information, or in other words, has the largest DE value. According to (8), for a particular number of redundant measurements (equivalently a particular number of remainder measurements), h(X a ) is proportional to |Σ a |. Therefore, for a particular number of redundant measurements, to simultaneously preserve most of the information and have a large MI, the best combination to be considered redundant is the one which leads to the largest value for |Σ a |. Consequently, since the determinant of a matrix is equal to the product of its singular values, the best practice to discard the redundant measurements is to omit the rows and columns of |Σ|, such that it leads to the elimination of the lowest singular values of |Σ|.
Following, based on the derived conclusion, two benchmarks are presented to assess the relative error of discarding the redundant measurements as a function of the number and combination of the selected measurements.

Pinpointing redundant measurements
Suppose Σ (n × n) is the covariance matrix for the first class of one of the system operating constraints. As Σ is symmetrical, the singular values of Σ is given by where U (n × n) is an orthogonal matrix and S (n × n) is a diagonal matrix that contains the singular values of Σ in descending To eliminate the lowest singular values of Σ, let us divide S to S a ((n − q) × (n − q)) and S b (q × q), such that S b contains the q singular values that are relatively small enough to be discarded (later, it will be discussed how small). Accordingly, It can be concluded from Equation (11) that If Σ i denotes the ith row of Σ, Equation (12) can be rewritten as If, as presumed, the values of b j are small enough, it can be concluded form Equation (13) Equation (14) implies that q rows of Σ are linearly dependent on the other n − q rows. Because each row of Σ is associated with one measurement, Equation (14) relates the problem of finding the redundant measurements to finding the linearly dependent rows of the covariance matrix. Following, the error associated with selecting each combination of the rows of Σ as the dependent rows is presented. Since we are looking for the minimum number of units to be installed, it is desirable to discard as many rows as possible, providing that the error remains below a certain limit. Hence, this error provides a yardstick to decide how many of the rows and which combination should be discarded.
Suppose Σ m , Σ m+1 , ..., and Σ m+q are chosen as the linearly dependent rows. Considering Equation (13), we have where F and G are defined as (16) and (17), respectively. It should be noted that for the simplicity of presentation, here, consequent rows of Σ were chosen, while any q out of n combination of the Σ rows could be selected.

Error of discarding redundant measurements
Considering (15), the error of discarding redundant measurements, denoted by E, equals F −1 S b U bT , which is a q by n matrix. To present the magnitude of the error as a scalar number, the 2-norm of this matrix is considered. By definition, the 2-norm of a given matrix, A, denoted by ‖A‖ 2 , equals to its largest singular value, max (A). From the properties of the 2-norm, if A, B, and C are matrices of proper sizes, we have Based on the above statement, ||F −1 || 2 equals (1∕ F min ), where F min is the smallest singular value of F , ||U bT || 2 equals 1 (since U is unitary), and ||S b || equals n−q+1 . Using Equation (18), an upper limit for ||E|| 2 can be provided as Defining the relative error of discarding the redundant measurements as the ratio of ||E|| 2 to ||Σ|| 2 , since ||Σ|| 2 equals 1 , the upper limit for the relative error is given by Equation (20) presents two important benchmarks regarding the identification of redundant measurements: • The first benchmark, E 1 , which relates the number of measurements that are considered as redundant (q) to the cap of the relative error: the more measurements considered redundant, the larger the value of n−q+1 and, hence, the larger the relative error cap. • The second benchmark, E 2 , which provides guidance for identifying the reluctant measurements. This benchmark suggests that to minimize the cap of the relative error, a combination of measurements leading to the minimum value for F min should be selected.

RESULTS AND DISCUSSIONS
The proposed approaches are applied to the IEEE 123 node test feeder. This system is unbalanced, with a nominal voltage of 4.16 kV and contains different types of loads. To study the impact of DERs, the network is modified by integrating a wind power plant, a solar power plant, and a diesel generator, with the rated power of 200, 200, and 400 kW, respectively (DER18, DER57, and DER76, respectively). Figure 1 depicts the schematic diagram of the test system [35].

Evaluation of operating condition
To explore different operating conditions, different set points are considered for the DERs by sweeping their active power generation from zero to the rated power, and the reactive power generation from the maximum leading to the maximum lagging power factor. At each set point, 7000 Monte-Carlo simulations are conducted to capture the probable variations of the system loads. To do so, a higher value than one, 1.12, is chosen for the mean of the Monte-Carlo simulations to put the system under further stress. Analyzing the Irish standard load profiles, developed by Electricity Supply Board (an Irish energy utility) for the Irish market in year 2019 [36], shows that the relative standard deviation of these load profiles varies between 0.25 and 0.42, where the relative standard deviation is defined as the ratio of the standard deviation to the mean. Therefore, here, 0.5 is selected as the standard deviation for the Monte-Carlo simulations to include all the load variations. This value corresponds to a relative standard deviation of 0.45, considering that 1.12 is selected as the mean. Note that these load profiles are employed for conducting time-series power flows in the following subsections. While the proposed approach could be employed to investigate any operating condition, in this study, the focus is on the under-and over-voltage and over-current constraints. Results of the Monte-Carlo simulations show that while no over-voltage condition (voltages above 1.05 pu) is observed, under-voltage (voltages below 0.95) occurs in 26.6% and 35.2% of the simulations at least at one of the phases of one of the nodes located in areas NE and CR in Figure 1, respectively. In addition, in 15.3% of the simulations, the over-current condition occurs at least at one of the phases of the line that connects bus 76 to 72, L76 (surpasses 140 Amperes). Denote V-NE and V-CR the value of the lowest 1-phase voltage of all the nodes located in NE and CR, respectively, and I-L76 the lowest value of the 1-phase current of L76. Three pairs of classes are formed based on the values of V-NE, V-CR, and I-L76, to represent the system operating constraints. The first classes represent the violation of each constraint and the second classes are standing for the normal operation, regarding each constraint.
In practice, it is not feasible to install measurement units at every desired location. For this purpose, four locations are nominated for the voltage measurements (V44, V60, V97, and V105) and three locations are considered for the current measurements (I13, I60, and I97), while DERs may be equipped with voltage and flow measurements, as well (voltage of the point of common connection and active and reactive power generation). The main substation is also assumed to be equipped with flow measurements (SUB). The measurements from the nominated units are utilized as the system features for the classification. DA is used as the classifier and 90% of the simulations are considered for training purposes. As a result, for each class, a normal distribution is adopted. The remainder 10% are used to form the test set. The proposed DA approach is programmed in Matlab R2019a and simulations are performed on a PC with an Intel(R) Core(TM) i7-8700K 3.7 GHz CPU and 16 GB of RAM. It takes around 22 s to build the DA model for each system constraint and 40 ms to predict using the trained models for each data sample. Figure 2 presents the results of applying DA to the test set. This figure depicts the probability of the violation of each system constraint, calculated using Equation (2), versus the actual constraints values. As noted, for the values beyond and within the constraints, as expected, the calculated probabilities are higher and lower than 0.5, for most of the data points, respectively. Another noticeable trend is that as the actual values get farther from the constraint values, the probability of the constraint violation grows/shrinks, implying that the calculated probability reflects the intensity of constraint violation and, therefore, it can be used to quantify the well-being of the system operating condition.  ble for all the system constraints and, hence, they are discarded from the list of candidate units.

Identification of redundant measurement units
As mentioned, Equation (14) links the problem of finding redundant measurement units to finding the linearly dependent rows of the covariance matrix. To implement Equation (14), the nine smallest singular values of the covariance matrices of

FIGURE 6
Absolute value of the elements of Σ 1r eigenvectors, associated with the nine smallest Σ 1r eigenvalues, for the constraint I-L76 different system constraints, together with their associated eigenvectors, are evaluated. Figures 4-6 show the absolute value of the elements of these eigenvectors for constraints V-NE, V-CR, and I-L76, respectively. Note that in these figures, for the sake of clarity, values beyond 0.15 are not shown. Also, they are plotted on a logarithmic scale. First, a similar pattern is recognizable for all the constraints in these figures. This is mainly because the topology of the network was assumed unchanged and, hence, the violation of each system constraint does not affect the covariance between each pair of the measurements to a great extent. Second, as the elements associated with the measurements of V60, V97, V105, SUB, and I13 have large values, it can be concluded that the redundancy exists among the measurements of these units. Despite the elements associated with the DER57, voltage measurements have large values; those which associated with the DER57 flow measurements have small values and, hence, DER57 is omitted from the list of redundant units. Following, at each step, a subset of V60, V97, V105, and I13 is discarded and the related error (||E|| 2 ∕||Σ|| 2 ) and the estimated cap of relative error (E 1 E 2 ) are evaluated using Equations (19) and (20), respectively. Figure 7 presents the results. It can be remarked that (E 1 E 2 ) provides a well-conservative estimation for the relative error. It is also noticed that for the combinations involving one or two of the units, the relative error is close to zero. On the other hand, for some combinations involving triple units, the relative error can take very large values; however, among them, the one that includes V97, V105, and I13 has the smallest relative error (3%). Therefore, the measurements of these units are considered redundant, leaving the optimal units as SUB, V60, DER57, DER18, and DER76.
In order to validate the provided results, the measurement placement approach of [37], which is based on the conservation voltage reduction concept, is implemented on the test feeder. This approach employs the maximum of the standard deviation of the estimated voltages for system nodes, driven by an SE approach, as a yardstick to evaluate the effectiveness of the measurement placement scheme. This yardstick is denoted by m . Since this paper only focusses on the voltages of the nodes placed in the susceptible areas of Figure 1, m is calculated only over these nodes. Since besides the substation, our approach suggested four of the measurement units, namely, V60, DER57, DER18, and DER76, as the best combination, we calculate m with each four selection out of the effective units, V60, V97, V105, DER57, DER76, DER18, and I13. The least m represents the best measurement placement scheme. Figure 8 depicts  Figure 7, as the least value of m is achieved with the combination that includes V60, DER57, DER18, and DER76 (and hence excludes v97, V105, and I13).

Receiver operating characteristic analysis
Receiver operating characteristic (ROC) analysis is employed to assess the impact of discarding the ineffective and redundant units on the DA performance in evaluating the system operating condition. This analysis is a prominent tool to assess the performance of classifiers [38]. Classification threshold is the limit that is used to decide whether an observation, X , belongs to a class, k, considering the probability of that observation belonging to that class, p k (X ) in (2). In the binary classification, 0.5 (50%) is generally selected as the classification threshold, which represents an unbiased decision. In ROC analysis, despite 0.5 (50%), other possible values are considered as the probability threshold in (2), from 0 (0%) to 1 (100%), and the true-positive rate (fraction of correct classification) and false-positive rate (fraction of incorrect classification) are calculated for each probability threshold. Afterwards, an ROC curve is formed for each class by depicting the true-positive rate versus the false-positive rate values. In this order, the area under each ROC curve (AUC) reflects the overall performance of the classifier, regarding each class [38]. For a perfect classifier, the ROC curve reaches the top left corner, with the AUC of one, while smaller values show retreating from the ideal classifier, such that AUC of 0.5 represents a total random classifier. Figure 9 plots the ROC curves for different system constraints with three models: the basic model that involves all the nominated units, the model that includes only effective units, and the model obtained after discarding the ineffective and redundant units. As noted, in all the cases, the AUC is above 0.93, showing that DA works near to ideal for this problem. In addition, discarding the ineffective and redundant units has not noticeably impacted the performance of the classification, which validates the proposed scheme for pinpointing the optimal measurement units. For the constraints V-NE and V-CR, the AUC of the basic model is slightly smaller than the two other ones. This is due to the enhancement in the numerical error of calculating the inverse of the covariance matrix in the latter cases, since with the basic model, the determinant of the covariance matrix is very close to zero (e.g. 3 × 10 −135 in the case of FIGURE 10 Calculated probability of constraint violation versus the actual value of V-NE. The blue hue represents correct evaluation of the system operating condition, while the red hue represents incorrect evaluation (as normal, while the actual condition is abnormal, or as abnormal, while the actual condition is normal)

FIGURE 11
Calculated probability of constraint violation versus the actual value of V-CR. The blue hue represents correct evaluation of the system operating condition, while the red hue represents incorrect evaluation (as normal, while the actual condition is abnormal, or as abnormal, while the actual condition is normal) V-NE, which with discarding the inefficient units improves to 9 × 10 −96 ).

Time-series power flows
Time-series power flows are conducted to investigate the performance of the proposed approaches under a realistic scenario. For this purpose, five different aforementioned Irish standard load profiles [36] are allocated randomly to the system loads, and the generation profiles of wind and solar power plants in a specific year in Ireland are considered for DER18 and DER57, respectively. DER76, a diesel generator, is assumed to be operated for load-flattering purposes. Consequently, its generation is considered to be proportional to the net value of feeder's load in the 24 h ago: equal to zero for values under 3000 kW, equal to its rated power for values upper than 4500 kW, and linear in the middle part. The model with the optimal units is utilized to evaluate the operating condition of the system. Figures 10-12, associated with the V-NE, V-CR, and I-L76 constraints, respectively, depict the calculated probability of constraint violation versus actual value of the system constraints for the 8760 h of a year. As  noticed, while the proposed approach has accurately predicted the condition of the system in most of the hours, the calculated probability is proportional to the severity of the violation, verifying that this measure can be employed as an index to describe the well-being of the system. Table 1 presents the rate of correct evaluation of the system operating condition, together with the rate of incorrect evaluation (as normal, while the actual condition is abnormal, or as abnormal, while the actual condition is normal) for the time-series power flows simulations. As seen, the error rate is below 2.5% for all the constrains. These results are consistent with the results of Figure 9, as the best performance is conceived for V-NE, and then I-L76 (total of 1% and 1.9% error, respectively). In order to compare the performance of the proposed DA approach, the weighted least square (WLS) SE approach [39] is implemented on the test feeder. In this order, the magnitude and phase of the system nodes voltage are considered as the system state variables. These variables are used afterwards to evaluate the system operating condition, regarding the V-NE, V-CR, and I-L76 constraints. On the same aforementioned environment, it takes around 60 s to estimate all the state variables for each data sample using this approach. Comparison to the 40 ms of the proposed DA approach establishes the superiority of our approach in terms of being fast-track. Table 2 presents the rate of correct and incorrect evaluation of the system operating condition, using the WLS approach on the results of time-series power flows. Compared to the results provided in Table 1, it can be noted that a higher level of accuracy is provided by the proposed DA approach, especially for predicting the status of the I-L76 constraint. This superiority can be justified by  [39], regarding the evaluation of the system operating condition in time-series power flows for different operating constraints considering that in the proposed DA approach, the concentration is on estimating only the important states, especially when they approach the system constraints, while the conventional SE methods do not prioritize the state variables.

CONCLUSION
A data-driven approach was proposed to deal with the SE in distribution systems, by addressing the system condition through the status of its operating constraints. To this end, each system operating constraint was presented as a binary classification problem and DA was employed as the classifier to evaluate the probability of the violation of each constraint. In this way, a comprehensible measure was provided for the operator to assess the well-being of the system operation and identify the susceptible areas in real time. Comparison showed that this approach is much faster than the conventional SE methods, which makes it a suitable choice to be employed in realtime applications. It also provides higher accuracy, regarding the evaluation of the system operating condition. While in this paper only the low-voltage and over-current conditions were studied, this approach is capable of investigating any operating constraint, including over-voltage, diverse power flow, fault etc. Following, a scheme for measurement placement in distribution systems was proposed. This was achieved first by assessing the contribution of each measurement unit to the classification problem and then by deploying the MI and DE concepts to pinpoint and discard the units with redundant information. This approach delivers optional solutions which enable to asses the trade-off between the quality of SE and the investment it requires. The conservation voltage reduction meter placement approach validated the results provided by this measurement placement scheme.