A Complex Fault Diagnostic Approach of Active Distribution Network Based on SBS-SFS Optimized Multi-SVM

After renewable energy distributed generator (DG) is connected to the power grid, traditional diverse-electric-information-based fault diagnosis approaches are not suitable for an active distributed network (ADN) due to the weak characteristics of fault current. Thus, this paper proposes a comprehensive nonformula fault diagnostic approach of ADN using only voltage as input. In the preprocess, sequential forward selection (SFS) and sequential backward selection (SBS) are utilized to optimize the input feature matrix of the sample in order to reduce the information redundancy of multiple measuring points in ADN. Then, a single “1-a-1” support vector machine (SVM) classiﬁer is used for fault identiﬁcation, and a multi-SVM, with radial basis function (RBF) as the kernel function, is applied to identify the location and fault type. To prove the proposed method is adaptable for ADN, two direct drive fans are used as a DG to test the IEEE 33 node model at every 10% of the line under three operating conditions that include all cases of distributed power generation in ADN. Results comparing real-time and historical data show that the proposed multi-SVM model reaches an average fault type diagnosis accuracy of 97.27%, with a fault identiﬁcation accuracy of 96%. A backpropagation neural network is then compared to the proposed model. The results show the superior performance of the SBS-SFS optimized multi-SVM. This model can be usefully applied to the fault diagnosis of new energy sources with distributed power access to distribution networks.


Introduction
As a significant part of the Ubiquitous Power Internet of ings (UPIOT), a distribution network is a complex framework that enables electric power to be distributed to numerous powered clients according to their voltage level. Any complex faults, such as short circuits among lines, that occur in the distribution network would inhibit practical work, increasing the cost of maintenance [1]. us, there is an urgent need to develop a fault detection system to prevent the deterioration of the stability and security of these distribution networks. Several experts [2][3][4][5] have been working on the prediction of distribution networks. Clean energy sources, including photovoltaic, wind power, and hydropower, have been developed as aggressive sources to solve the lasting reliance on fossil fuels [6]. ese developments have stimulated the implementation of wind turbines (WTs) into the active distribution network (ADN) in recent years [7]. However, WTs and other equipment that are connected to a distribution network act as distributed generators (DGs) [8], which have complex structures, flexible operation, and uncertain outputs. e main reason is that the control strategy adopted by the power electronic device of which limits the magnitude of the fault current, which is relatively small and difficult to express with formulas. ese features undermine the utility of traditional fault diagnosis methods based on one-way power flow characteristics, creating an urgent demand for other diagnostic methods that are suitable for multi-DG accessed ADNs [9].
An ideal fault diagnostic system should include state identification, fault type identification, and the determination of fault locations [10]. State identification in a distributed network relies on threshold setting or logical judgment utilized by three-stage current [11] or low voltage protection, and none of these methods take the effect of DGs in ADNs into consideration. Fault type identification in distribution grids generally contains single-phase, twophase, and three-phase short circuits to ground, as well as two-phase short circuits. Several researchers use current, direct, and other information to determine fault type, while their judgment process requires cutting off the entire power network [12], which is not suitable for ADNs with renewable DGs. ere are two primary fault location methods for an ADN: one is to improve the traditional three-segment current protection using information uploaded by a distribution automation system, and the other is to use widearea measurement information for fault location [13]. Existing approaches cover a broader range of fault areas in distribution grids and increase the accuracy of fault location. ough progress has been made in previous reports, state identification, fault classification, and fault identification all use different factors of the distribution grid, and fault diagnosis in ADNs can be time-consuming. ere is a lack of a comprehensive evaluation method to realize ADN fault status discrimination, fault type classification, and fault location with less electrical quantity.
Machine learning-based theories are emerging techniques combining mass data-driven from sensors and have previously been applied to grid fault diagnosis [14]. Commonly used artificial intelligence methods include artificial neural networks (ANNs) [15], Petri nets [16], and extreme learning machine (ELM) [17]. Support vector machine (SVM) [18] is a major algorithm of classification learning that separates the entire sample zone into several parts and then uses the boundaries to classify the data correctly. e samples nearest to the boundaries are named support vectors [19]. e rule for determining the boundaries using SVM is "1-a-1" voting, which is quite suitable for two-state classification problems like state identification in distributed networks. Compared with other artificial intelligence algorithms, the final decision function of SVM is determined by only a few support vectors, which outperform those algorithms whose convergence process is closely related to the dimension of the sample space. is approach is probably not feasible for fault type and fault location diagnosis owing to the high-dimensional features of the data. To solve this problem, measures should be taken to reduce the data feature dimensions. Selection of forward sequential (SFS) and selection of backward sequential (SBS) are processes that select out factors that best reflect the characteristics of the data. Using both SFS and SBS is optimal because the data undergo multiple screening processes. Inspired by machine learning theory, this paper proposes a complex fault diagnostic approach for ADNs using only voltage information. Specifically, single SVM classifiers are applied to state identification, while multi-SVM classifiers optimized by SFS and SBS are used to categorize fault type and locate faults. Subsequently, the proposed approach is employed in an IEEE 33-node model combined with two DGs to test its reliability.
is paper is organized as follows: Section 2 presents the complex methodologies of using SBS and SFS to select the fittest feature matrix of SVM and a related flowchart of the proposed model. In Section 3, the implantation of the proposed method for ADN fault diagnosis is described, including a structural flowchart. Section 4 employs SBS/SFSmulti-SVM to an ADN with 2 DGs and then tests this model with simulation data of voltage information from an IEEE 33-node model to demonstrate its effectiveness. e fault location of diverse fault types is also presented in this section. Section 5 provides the results of the experiments. Vapnik [20], SVM is a machine learning method that is based on statistical theory. SVM can map trained input data into a high-dimensional feature space using the inner product function [21]. For a single SVM classifier, a linear decision surface with a maximum margin of separation between the two input classes among the mapping process is generated. e next group of input data can be matched to their category according to the feature matrix. e matrix that best reflects the characteristics of the input data is selected.

SVM. Proposed by
A sample dataset X � X 1 , X 2 , . . . , X N contains two values, described by a label Y i ∈ −1, 1 { }, representing the corresponding binary class label. us, the training data can be written as (X 1 , Y 1 ), (X 2 , Y 2 ), . . . , (X N , Y N ) . For linearly separable data, the regression function can be given by Vapnik [22]: Here, w is the normal vector of hyperplane and b is a constant. ose two coefficients determine the location of the optimal hyperplane that meets the maximum interval of two classes. For point X 1 , X 2 from w · X + b � 1 and w · X + b � −1 individually, the interval among them is w/‖w‖ · (X 1 − X 2 ) � 2/‖w‖.
Finding the optimal hyperplane (maximum interval) can be considered a quadratic programming problem, which can be described as follows: where ξ is a positive slack variable and C is the loss factor. e above equations can be solved with the Lagrange method, whose function is given by where a and μ are Lagrange multipliers. e function should meet the following constraints: e kernel function is used to measure the distance between two input vectors in the nonlinearly transformed feature space, which requires the mapping process, as shown in Figure 1.
During the nonlinear classifying process φ(X), the problem can be restated as follows: . e above equations can then be written in a brief form: Current research has focused on finding better kernel functions as the main improvement direction for SVM. e most commonly used kernel function is the radial basis function (RBF).

SBS and SFS
2.2.1. Feature Selection Rules of SVM. Before data classification for SVM, the original data should be expressed in a brief but accurate way due to its feature of high dimensions. e universal method to solve this problem is feature selection and extraction, which searches all properties to reduce the number of dimensions. e determining criterion in SVM is based on the classification accuracy; however, it is not applicable for cases with limited samples and high feature dimensions. For a hyperplane, f(X) � w · X + b is one of the key factors to determine classification effects. e value of this function is determined by a withinclass scatter matrix S w and a between-class scatter matrix S b [23]: where μ i represents the sample mean of a class and μ is the mean of the total sample.
By calculating the value of J 1 , higher criterion means better classification performance of w. us, the best feature matrix is ensured.

SFS and SBS.
SFS and SBS are data traversal processes based on the leave-one-out cross-validation theory, which can select features to improve model classification performance. Namely, SFS starts its searching process from an empty set, adds one property to the feature set each time, tests the classification performance of this set, and repeats until the best classification accuracy is reached. e pseudocode of SFS is shown in Algorithm 1 [24].
In contrast to SFS, SBS starts with the entire sample set and deletes one property from the set after each test until optimal classification performance is reached.

SBS/SFS Modified SVM.
From the above description of SFS, SBS, and SVM, our proposed method for ADN fault diagnosis is built. After the original data are loaded into the model, the classification accuracy of SVM is calculated, if it does not meet the demand. SBS and SFS processes are employed to ensure an optimal M-dimensional matrix. For each test, one type of property is added or deleted from the original dataset; then, the new dataset is used to retest its SVM accuracy. e whole process continues until the accuracy is more than 95%. e flowchart of SBS and SFS optimized SVM is drawn in Figure 2.

Model Establishment.
e proposed method achieves the three aims of ADN fault diagnosis separately. Specifically, the operation state of an ADN contains fault and normal states. e most common fault types are single-phase short to ground, two-phase short, two-phase short to ground, and three-phase short. e fault location depends on the number of nodes in the ADN.
In this work, the input information originates from voltage instead of the switching state of the circuit breakers or other information for achieving the aforementioned goals. e fault states along with fault types and their location are outcomes of this model.

Data Reduction. Sensors in
ADNs can obtain different varieties of voltage information, including phase voltage, fundamental frequency, and harmonic components. e proposed model uses this diverse information to aid fault diagnosis.

State Identification.
To figure out the relationship between input and output information, we establish an example ADN as shown in Figure 3. Here, V s denotes the power on the system side, the numbers represent the serial Mathematical Problems in Engineering numbers of the measurement points, and f1, f2, and f3 mark fault locations on the line.
When a fault occurs at location f1, the amplitude of lowfrequency voltage components is far from normal operation. ADN's fundamental frequency voltage component has a difference of 2 kV under normal operation and low-frequency operation, which is clearly seen in Figure 4.
ere is no obvious difference in amplitudes at other frequencies. Due to differences in the accuracy of sensors and the selection of measurement points in an ADN, in practice, it is necessary to select measurement points that can accurately describe the fault and nonfault conditions, and use the obtained voltages as input information to determine operational status. Because the amplitude of the fundamental frequency voltage in the nonfault state of the ADN is much larger than that of the fault state, it is selected as the characteristic quantity to judge the fault status.

Fault Type Diagnosis.
Single-phase short to ground, two-phase short, two-phase short to ground, and threephase short are closely related to each phase voltage component. Taking phase A as an example, when a short circuit between AB phases occurs, the voltages of A and B phases are equal, i.e., the voltage between two phases is 0. Similarly, the difference between A, B, and C phases when an ABC three-phase short circuit occurs is 0. e voltage difference for each fault type is shown in Figure 5. e zero-sequence voltage is higher than 0 when a short to ground occurs, allowing for easy identification. erefore, the phase difference value and the zero-sequence voltage can be used to determine the fault type.

Fault Location.
When the same fault occurs at f1, f2, and f3, respectively, the low-frequency voltage component of the fault phase shows a significant difference from the same measuring point, as shown in Figure 5. It can be concluded from the figure that the low-frequency voltage amplitude components generated by the fault show the same trend at the three measurement points, that is, Uf1 > Uf2 > Uf3.
erefore, based on the judgment of the operating state and the type of fault, the low-frequency component of the fault phase voltage can be used to locate the fault ( Figure 6).

SBS/SFS-SVM for ADN Fault Diagnosis Platform.
e proposed complex fault diagnosis platform for ADNs is ( For each X left in the X N−k , select the X with the greatest J according to accuracy evaluated by SVM (11) back to the selection process (12) P c � P 1 , P 2 , . . . , P C ALGORITHM 1: Pseudocode of SFS. 4 Mathematical Problems in Engineering shown in Figure 7. After data are collected from the sensors, the data selection process is performed on the voltage information. en, SBS and SFS are used to add or delete unrelated features, forming the optimal M-dimensional feature matrix. e fundamental frequency voltage is then used to monitor the operating status and differences in phase and zero-sequence voltages are analyzed to distinguish fault types. Finally, the low-frequency voltage components are used to determine the location of the fault.      160 sets of fault data, the three-phase fundamental and zerosequence fundamental voltage amplitudes at 34 measurement points under each operating state were obtained. e difference between the fundamental phase voltages U a −U b , U b −U c , and U c −U a was then calculated. ose values, together with the zero-sequence fundamental voltage, form a four-dimensional characteristic vector at each measurement point.

Validation of SBS/SFS-SVM
(3) Fault Location. e AB two-phase short-circuit fault was simulated at every 1% of the line length in the 40%-60% section of each line of the ADN model. Subsequently, phase A voltages at various measurement points within one week after the occurrence of fault are extracted. Fourier analysis is then used to obtain the phase A fundamental frequency and 2−7th harmonic voltage amplitudes at 34 measurement points in each operating state. is is the original dataset. Finally, the AB two-phase short-circuit fault simulation is performed on the unit line at points every 10% of the length, and the same data processing is performed as with the original multi-SVM sample set.

State Identification.
e state identification based on the fundamental frequency component of the voltage is shown in Figure 9. For each set of voltage fundamental frequencies, the SFS/SBS-SVM forms optimized one-dimensional and two-dimensional feature vectors. en, the combination of those two vectors is used as input for the SVM state identification. e two operation states are well separated on both sides of the hyperplane, highlighting the excellent state identification performance of the SFS/SBS-SVM method. Optimal M-dimensional feature matrix2

Fault
Optimal M-dimensional feature matrixn C o r re la ti o n C o r r e la t io n a n a ly s is a n a ly s is Two-phase short to ground … … Figure 7: e proposed complex fault diagnostic approach for an ADN based on SBS-SFS optimized multi-SVM. accuracy is 97.27%, which is higher than the presetting value of 95%. ese results suggest that SFS and SBS play a significant role in adding and deleting unrelated factors from the original feature matrix. In addition, the coding program results show the optimal feature matrix is four-dimensional, most likely taking all the phase voltage differences and zerosequence voltage into account for the calculation (Figure 10).
To prove the efficiency of the proposed SFS/SBS modified multi-SVM method, we compare the results of the fault type classification with a backpropagation neural network, as shown in Figure 11. e backpropagation neural network does not perform well in fault type recognition. Several groups of data do not match their original fault type, and the average fault type classification of backpropagation is 85.24%, which is much less than our model.

Fault Location
(1) Operation Condition 1: DG1 and DG2 Are Both Connected to the Grid. Based on the simulation data with both DGs connected to the grid, the feature quantities are selected, and a multi-SVM classification model is established to locate the fault.
Taking the AB two-phase short-circuit fault as an example, because the model contains 32 segments, the samples need to be divided into 32 categories. Each type of sample set contains 21 training samples. e optimal feature matrix is two-dimensional: it can be seen from Figure 12 that when different line faults occur, the sample points of the measurement data are gathered in different regions, enhancing the classification effect. erefore, the fault location can be distinguished by differences found in this two-dimensional data. e training sample set composed of the two-dimensional feature quantity is used as input for the SVM classifier to obtain a fault discrimination SVM model. e sample data of different line faults are separated by the hyperplane in the model. e test sample is loaded as input into the SVM, the label of the sample is obtained, and then, the fault section corresponding to the label is determined. e fault section is extended by a section of the line upstream and downstream to obtain a suspected fault area. e location accuracy of the suspected fault area was found to be 95% that meets the presetting value; therefore, a three-dimensional optimal feature matrix was developed to relocate the fault. e optimal feature matrix is three-dimensional: it can be seen from Figure 13 that when different line faults occur, the measurement data sample points are found in different regions, and a better fault location effect can be obtained. e training sample set composed of the three-dimensional feature quantity is loaded into the SVM classifier to obtain a fault localization multi-SVM model, again producing a location accuracy of 95%, identical to the 2D feature matrix 1-a-1 voting method and frequency statistics Figure 9: Fault type identification of an integrated SVM ADN based on the "1-a-1" method.  results. erefore, we suggest that the fault location limit of the SFS/SBS-SVM model for this kind of working condition has been reached without increasing the dimension of the optimal feature matrix. Comparison with back-propagation neural network: the fault location accuracy of the backpropagation neural network is shown in Figure 14. e vertical axis represents the fault zone. Although the fault type identification is over 95%, the fault location accuracy is only 66.33%. Notably, the backpropagation neural network has a distinct difficulty identifying faults that occur at length 7 and also misjudges a fault occurring from section 8 to section 11 as a fault occurring in section 9, which does not meet the demand.   DG2 is off-grid, the AB two-phase short-circuit fault is simulated at every 1% of the line length in each line segment. Subsequently, the amplitude of the A-phase fundamental voltage at measurement points 4 and 15 is extracted. Combining the two into a two-dimensional feature quantity, this sample is input to the DG1 grid connection, and the multi-SVM under the DG2 off-grid condition is used for training and testing. e test result of the location accuracy of the suspected fault area is 100% (Figure 15).
To avoid erroneous location results, we enlarge the dimension of the optimal matrix from 1 to 2. We then simulate faults at 0%, 10%, 20%, 30%, 70%, 80%, 90%, and 100% of each section of the line and extract phase A fundamental wave voltage amplitude at measurement points 4 and 15 which are combined into a two-dimensional feature to form a test sample set. is set is used as input for the multi-SVM classifier to obtain a suspected fault area, and the location accuracy of the suspected fault area is again found to be 100%.

(3) Operation Condition 3: DG1 Off-Grid and DG2 On-Grid.
Under the conditions that DG1 is off-grid and DG2 is ongrid, the AB two-phase short-circuit fault is simulated at every 1% of the line length in the 40%-60% section of each line. Phase A fundamental wave voltage amplitude at measurement points 4 and 15 are extracted to form a twodimensional feature, and both are combined into a sample set. Subsequently, the sample was input to a multi-SVM under the conditions that DG1 was off-grid and DG2 was on-grid, and the suspected fault area localization accuracy was 100% ( Figure 16).
As above, we simulated the occurrence of a fault at 0%, 10%, 20%, 30%, 70%, 80%, 90%, and 100% of each section of the line and extracted the A-phase fundamental voltage amplitudes at test points 4 and 15. e location accuracy rate of this test after SVM optimization is 96%.
In summary, the SVM classification model trained on the training sample composed of the two-dimensional feature set provides accurate classification results, even when DGs are removed. Table 2 lists the fault diagnosis performance of the proposed model and the backpropagation neural network. Both algorithms have accurate operation identification performance. Notably, SBS/SFS-SVM has better performance in fault type classification and fault location, which are both over 95%, respectively.

Comparison of Results.
Optimizing the structural parameters of the classification algorithm can certainly increase the fault diagnosis rate. However, in an active power distribution network connected to multiple wind turbines, only a few measurement points have significant characteristics in characterizing ADN fault anomalies. e input matrix dimension reduction process of SFS/SBS can select key measurement points from hundreds of measurement points and form SVM support vectors from them, which greatly improves the accuracy of fault diagnosis. Although BPNN is an excellent multiclass classification algorithm, it is applied to the fault diagnosis of ADN. e input matrix is huge and there is a risk of falling into a dimensional disaster.

Conclusions
Fault diagnosis of distributed networks is vital to ensure the reliability of the power grid. Numerous related diagnostic models have been applied in this area. However, they lack the combination of functions required for optimal performance, namely, state identification, fault type identification, and fault location. Existing approaches focus solely on distribution network fault diagnosis, ignoring the addition of DGs to the grid.
In addition, the existing methods need to combine different types of electrical quantity information in order to achieve fault diagnosis and location, among which increases the difficulty in information processing. Our complex ADN fault diagnostic approach performs exceptionally in-state identification, fault type identification, and fault location, while relying upon only voltage information.
Specifically, single SVM is applied to distinguish the operation state of faults in the ADN, and multi-SVM, obeying "1-a-1" rule, was used to identify fault type and locate faults. To enable better extraction of feature matrices for SVM, SFS and SBS were successfully employed to improve accuracy. We validated the optimized model by establishing an IEEE 33-node model with two WTs as DGs and rigorously testing with 160 sets of simulated data. e results show that the SFS/SBS-SVM method can reliably and accurately locate the fault and determine its type.
In practice, a large number of electronic devices are connected to an ADN, which limits the fault current and obscures the failure characteristics. Future work will focus on extending fault identification and location in ADNs under more realistic, weak current conditions. is method is proposed for the problem that it is of significance to capture the short-circuit current caused by the modulation strategy of the power electronic device in the ADN of the AC system. erefore, this method is also applicable to a DC system where a large number of power electronic devices cause weak currents to be difficult to capture. Considering the difference between the IEEE 33node model and the actual ADN model, the following two situations should be discussed in applying this method to practical work: (a) e newly created ADNs. Due to the lack of historical operation information, it is impossible to use a large amount of data to establish an SFS/SBS-SVM model.
erefore, an ADN model consistent with the actual system can be constructed through the simulation platform (PSCAD), and a large number of fault samples can be obtained through simulation, so as to train the required fault diagnosis model. (b) For the ADN that has been put into operation. e training process of the SFS/SBS-SVM model can be directly based on the historical data obtained from the measurement points and the phase voltage components selected in section 3.
In conclusion, the SBS/SFS-SVM model is well suited for ADN fault diagnosis problems. It is also adaptable for classification and prediction problems in other areas such as electrical devices monitoring.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.