Fault Detection and Classification in Interconnected System with Wind Generation Using ANN and SVM

. Protective relays are installed in generation, transmission, and distribution system for detection, classi(cid:28)cation, and estimation of faults. To match the future load demand and to get uninterrupted power supply, use of renewable energy sources are increasing day by day. Faults can occur in transmission lines, transformers, generators, and busbars but the nature of these faults may change many times when renewable energy sources are considered. This research paper introduce techniques to detect and classify different faults on transmission line in the presence of wind energy sources using ef(cid:28)cient tools of ar-ti(cid:28)cial intelligence. The main challenges of the system fault detection, in presence of wind turbine lie in their non-linearity, uncertainty and unknown disturbances. PSCAD/EMTDC software tool is used to sim-ulate the power system model with RES which is implemented in MATLAB and Python software. Arti(cid:28)cial Neural Network (ANN) and Support Vector Machine (SVM) algorithms have been used to classify and detect faults on transmission lines connected with wind energy source. The proposed technique has been validated for internal faults on transmission line and external faults on power system. In total of 4320 internal and external fault cases with wide variation in system parameters have been used for validation of the proposed model. The proposed model gives an overall fault zone identi(cid:28)cation accuracy of more than 99 % in presence of wind energy source. The results obtained from validation show that the performance of SVM classi(cid:28)er is better than ANN in term of ef(cid:28)cacy and classi(cid:28)cation time.

Abstract. Protective relays are installed in generation, transmission, and distribution system for detection, classication, and estimation of faults. To match the future load demand and to get uninterrupted power supply, use of renewable energy sources are increasing day by day. Faults can occur in transmission lines, transformers, generators, and busbars but the nature of these faults may change many times when renewable energy sources are considered. This research paper introduce techniques to detect and classify different faults on transmission line in the presence of wind energy sources using efcient tools of articial intelligence. The main challenges of the system fault detection, in presence of wind turbine lie in their non-linearity, uncertainty and unknown disturbances. PSCAD/EMTDC software tool is used to simulate the power system model with RES which is implemented in MATLAB and Python software. Articial Neural Network (ANN) and Support Vector Machine (SVM) algorithms have been used to classify and detect faults on transmission lines connected with wind energy source. The proposed technique has been validated for internal faults on transmission line and external faults on power system. In total of 4320 internal and external fault cases with wide variation in system parameters have been used for validation of the proposed model. The proposed model gives an overall fault zone identication accuracy of more than 99 % in presence of wind energy source. The results obtained from validation show that the performance of SVM classier is better than ANN in term of efcacy and classication time.

Introduction
Renewable energy generation is increasing tremendously nowadays whole over the world to mitigate electricity demand. Small scale and large scale penetration of wind and solar system are creating problems of false tripping, over reach, under reach and malfunctioning of transmission line relay. To overcome above problems at transmission, distribution and micro-grid level, scientists have done enough research work.
In the present era, use of renewable energy sources is signicantly increasing to generate electricity as a progressive attempt towards prospective low carbon emission system [1]. Different factors are affecting protection systems of transmission line when integrated with renewable energy sources. The variation of wind parameters signicantly affects the reach of the distance protection of transmission line. Short circuit behavior is completely different in induction types wind generators as compared to conventional synchronous generators, which is one of the important aspect to decide the characteristics of the distance protection. Distance protection characteristic is also affected by parameters like fault location, wind speed, mutual coupling, fault resistance etc. in the presence of wind system [1] and [2].
In last few years, many researchers have focused to solve the protection issues, using numerical relay associated with the signal processing and machine learning techniques. Fast and accurate fault identication and discrimination is a primary goal of numerical protection relay. In article [3], comprehensive review of different fault detection, classication and location has been presented. This paper serves as a guideline for the researchers who are working in this domain. Over the years, many machine learning and classication techniques have been developed, tested, and implemented in the electrical power system. Few of them are mentioned in the research article [3]. In [4], Articial Neural Network (ANN) based back propagation technique has been implemented. Syntactic pattern recognition function model has been efciently used for detection of fault at transmission line. Moreover, VHSIC Hardware Description Language (VHDL) has been implemented on power system model for measurement of system parameters [5]. In research article [6], Deep Neural Network has been applied for fault detection and classication. In the case of fault detection, researchers have investigated the effects of two hyper parameters, number of hidden layer and number of neurons in the last hidden layer on the performance of networks. The author concluded that by increasing the network size, the fault detection accuracy did not improve above certain level. Authors in [7] implemented Support Vector Machine (SVM) technique for fault detection, and ANN technique for fault location and classication in 400 kV three phase double circuit transmission line with linear and non-linear load at better accuracy. Other authors also implemented SVM classier on 400 kV transmission line and has achieved fault classication accuracy of 99.5 % [8]. The data can be analyzed and classied based on Articial Neural Network [9]. Fuzzy interfaced scheme has been proposed in [10], which gives 99 % accuracy for detection of fault. Decision Tree method has been introduced in [11]. This method uses data from one side of the protected line and the decision is performed in less than a quarter cycle.
The ANN and SVM-based approach to real-time fault classication with high accuracy and high speed implementation are discussed in research article [12], [13], [14] and [15]. Modied multi-class SVM approach has been implemented and discussed in article [13] for distribution system fault detection. In [14], fault prediction in presences of wind DG using python algorithm is proposed. Proposed method also reduced the time require to clear the fault in wind based power system network. In [16], authors presented adaptive reach of numerical distance relay by considering various system parameters. In article [15], a modied multiclass SVM technique has been used to detect and classify fault in distribution system. The Radial Basis Function (RBF) kernel function has been used to develop MMC-SVM model. To improve impedance reach of the numerical relay by adaptive setting of the quadrilateral characteristics was proposed in research paper [16]. In research article [17], SVM technique has been used to detect and classify fault, whereas ANN based classication has been shown in [19] and [26].
Multiple SVM model based hybrid classication has been introduced in [20] and [21]. Classication and location of fault in distribution network with renewable source has been implemented in [22]. Dynamic and static model comparison to classify faults in power system network has been given in [23]. Multi-resolution analysis using stockwell's transform has been implemented for detection of LG, LL, LLG and LLLG faults in power system network integrated with wind energy system. S-contour, amplitude plot and variance graph has been used to recognize the fault [25]. Decision Tree and concurrent neuro fuzzy AI techniques has been applied in [32] for fault classication and detection on nine phase transmission line system. However, as stated in [32], the complexity will be increased with the increase in the level of phases and will reduce the accuracy of program execution.
The performance of the power system has been investigated during a noisy condition in [29], [30] and [31], in which white Gaussian noise has been contaminated with the recorded fault signals measured at the relaying point. The results show that the fault index is higher than threshold with noise signal. Therefore, the proposed protection scheme is not affected by the distorted signal in the presence of recorded signal as given in article [29], [30] and [31]. However, the accuracy of WT based technique is affected by high frequency noise signals penetrated during decomposition of current signals. The same is not much affected for classication technique based on NN, SVM and RVM. Different techniques investigated by several researchers for detection, classication, and localization of transmission line faults are described in [18] and [24].
MHO relay is widely used in the protection of transmission line to detect all kinds of faults. However, this relay sometimes fails to detect high resistance fault in its own zone of protection under the situation of varying system and fault parameters. In this paper, a portion of power system has been simulated in PSCAD, where 100 km long transmission line has been considered. To test the MHO relay characteristic, a line to ground fault with varying fault resistance has been created at 70 km of line length (in-zone fault). Performance of distance protection by MHO relay at fault resistance of 5 Ω, 10 Ω, and 18 Ω have been shown in Fig. 1, Fig. 2, and Fig. 3 respectively.   The results represent the effect of fault resistance variation on distance protection characteristics. Relay is misoperating in the second or the third zone as shown in Fig. 2 and Fig. 3, respectively, even though the fault is in the rst zone due to increasing value of fault resistance. Similarly, the variation of other parameters of power system network may weaken the performance of the relay under faulty conditions specically with the penetration of renewable sources in the network. This may create a problem of under reach and overreach of protective scheme in the transmission line.
The ANN and SVM techniques are presented in this article for classication of in-zone and out-of-zone faults on transmission line. Various fault resistance, fault inception angles, load angles, and fault locations are considered in the presence of wind generation system. Feasibility of the proposed algorithms has been tested on an IEEE 9 bus power system network with integration of wind system at bus 3. The system model has been developed using PSCAD/EMTDC software package. A simulation data set of 12570 cases has been generated using an automatic fault data generation model developed by the authors. Among which, 4320 simulation cases have been considered for validation of the proposed ANN and SVM technique. Figure 4 shows a single line diagram of IEEE 9 bus 230 kV electrical power system network considered for the simulation studies. IEEE 9 bus system is consisting power generators G1, G2 and wind system generator G3, six transmission lines, three transformers and three loads connected at bus 4, 5 and 8. The generators G1, G2 are modeled as an equivalent dynamic source consisting of a multi machine system connected to bus 1 and 2 respectively. Whereas generator G3 is Type 3 Wind Turbine Model used as renewable energy source (wind farm) which is intermittent in power generation. Bergeron model with distributed parameters has been used for modelling of transmission line. The system including generation system, transmission line, transformer and connected load parameters are given in the appendix.

Proposed System Modelling
A sampling frequency of 4 kHz at 50 Hz nominal frequency has been used. A channel plot step has been taken as 250 µs, i.e. 80 samples/cycle. Post fault data have been captured with measuring devices like CVT and CT. The same conguration is used normally in digital relay available at the market. All ten types of faults on line between bus 8 and bus 9 at various locations with different values of fault resistance, fault inception angles and power ow angles have been simulated, including large numbers of internal faults. For each case, the voltage and current values are measured and saved as a data le from PSCAD software. In the similar way, external faults have been also simulated outside the line between bus 8 and bus 9 including location on bus 8, bus 9, line between bus 7 and 8, line between bus 6 and 9 along with all above mentioned internal fault parameters.    Table 1 and Tab. 2 shows different cases of 5670 internal faults and 6900 external faults created, respectively. It is observed from Tab. 1 that out of 5670 internal faults and 6900 external faults, 3750 (66.16 % of total internal cases) and 4500 (65.22 % of total external cases) have been utilized as training process of ANN and SVM. The remaining 1920 (33.84 % of total internal cases) and 2400 (34.78 % of total external cases) have been utilized for testing and validation of the proposed algorithm. The trained ANN and SVM based fault classier models are then extensively used for testing of unseen fault data. The ANN and SVM fault detection technique have been veried for all symmetrical and asymmetrical faults (L-G, LL, LL-G, LLL-G) at different locations. These algorithms are tested with wide variation in fault resistance, Fault Inception Angle (FIA) (0180 • ) and also load ow angle are evaluated for internal and external faults in the system.

Input
Hidden nodes Output Before applying the ANN, the training and testing data sets are normalized column wise using Eq. (1) to avoid under tting issues, as it may destroy accuracy of the model. The model generally does not perform well for given data set, so now pre-processing of the data points, removal of noise from the data is the prime requirement [6]. The training and testing input values are required to re-scale using Eq. (1). Input values u i are normalized as shown in Eq. (1) to improve the accuracy of algorithm to detect and classify faults of power system network.
where, u i is the input values of post fault sending end and receiving end voltages and currents, u max and u min are the maximum and minimum values of the input column, respectively. Human brain has millions of neurons which do many sensitive tasks. It takes signals from different parts of the body and using the brain, it generates appropriate action naturally. The ANN works similarly but it is articial in nature. The ANN has capability of parallel processing, nonlinear mapping, online and ofine learning approach. Neurons are known as nodes in articial system. The ANN has input layer, hidden layer and output layer. ANN process depends on network topology like, feed forward single or multi-layer as shown in Fig. 5 [4], and feedback network (weight updating or learning) as given in Fig. 6.
The ANN basically classies three types of learning methods, supervised learning, unsupervised learning and reinforcement learning. Here, supervised learning method has been used as shown in Fig. 6 [4]. The estimated output has been compared with the desired output; the error signal is generated as the difference between the predicted values and the actual values. Based on the error signal, weights are modied to minimize the error so that desire output matches with the calculated output. ANN algorithm can be applied as a feed forward and feedback neural network. Here in this paper, the back propagation method has been applied for detection and classication of faults. Back propagation algorithm eventually corrects the weights among the different layers, according to the difference between the targeted output and calculated output. An activation function makes back propagation achievable since the gradient are passed with error to update weight and bias. Linear activation function and nonlinear activation function such as Sigmoid, Tanh, ReLU, Softmax activation functions are used to achieve the accurate output. Output of the hidden layer is calculated from the activation function. Activation value of the connected node depends on the summation of bias and weights sum of all inputs connected to it as given by Eq. (2) and Eq. (3).
In normal practice Rectied Linear Unit (ReLU) activation function is preferred because of less computation, faster in operation and easy to reach at desire output. But for binary classication sigmoid function is widely used. Sigmoid non linear transformation is used to detect and classify faults as shown in Eq. (5). Equation (5) is computed from Eq. (2), Eq. (3) and Eq. (4).
where, N et j = Net Input of the j th Layer, b i = Bias of hidden layer, u j Bias is the degree of sensitivity, with which the hidden layer u j answer to the perturbation it receives by the net input. Equation (5) represents the feed forward algorithm of neural network. Error factor is calculated by taking square of actual outputs subtracted from target outputs summation [27] as shown in Eq. (6) and Eq. (7).
Error signal is dened as: where, E is the error, x is the model, t k is the target, u k is the ouput To correct the weight for achieving desire output, the back propagation delta rule has been applied. The coefcient of error in delta rule is calculated by difference between the actual output and the predicted output and relating this difference to the derivative between the activation state of the actual output and the net input of that output as shown in Eq. (8) and Eq. (9).
where, t j is the target output, u j is the actual output, u j (1 − uj) is the derivative between actual output and net input of j th layer as given in Eq. (8). The error coefcient of back propagation method is indicated in Eq. (9).
By substituting Eq. (11) into Eq. (12), we obtain: By substituting Eq. (12) and Eq. (9) into Eq. (10), we obtain: Quantity of the value added or subtracted from the weight w ji depends on δout j with respect to the activation state of layer u i the activation with which u j is connected to weight w ji and in relation to coefcient r as shown in Eq. (13). The δw ji can be negative or positive. The value can be added or subtracted from the previous value of weight w ji as shown in Eq. (14).
In Eq. (14), each arriving layer of weight has an actual value which is comparable with an ideal value as mentioned in the articles [14] and [27]. Figure 7 shows owchart of ANN training model.

Discussion
The ANN Back propagation model is trained using MATLAB functions and Python coding. Both the software, MATLAB and Python are giving satisfactory results of faults classication as shown in Tab. 3 to Tab. 6. Table 3 shows overall classication accuracy of in-zone faults (line between buses 89) and out-ofzone faults on transmission line using Python coding. Table 4 shows the fault type wise classication accuracy using Python coding. Table 5 shows the fault classication accuracy with 6 hidden layers at different training functions in MATLAB. Training function depends on many factors, such as complexity of the problem, the number of data points in the training set, the number of weights and biases in the network, no of hidden layers, the error goal, and whether the network is being used for pattern recognition regression. increases with increasing hidden layer from 6 to 10 as given in Tab. 6 and Tab. 7, respectively.
The confusion matrix plots have been plotted in Fig. 8 and Fig. 9 of all symmetrical and asym-metrical internal faults with hidden layer 6 and 10, respectively. The confusion matrix represents the total number of fault detection observations in each cell. The rows of the confusion matrix correspond to the output class (actual value) and the columns of the confusion matrix correspond to the target class (predicted values). In the confusion matrix, all ten types of faults have been mentioned, i.e. rst three faults are L-G faults for R-Y-B 3-phase, respectively. Diagonal and off-diagonal cells show correctly and incorrectly classied fault observations, respectively. As shown in Fig. 8, out of 1920 fault cases, 192 fault cases that are taken for each fault types. The rst column represent 192 fault cases are correctly classied as R-G fault type, therefore the column accuracy for R-G fault types are 100 %. Also rst row indicates 192 R-G faults along with 4 YB (LL). faults are misclassied as R-G (L-G) faults, so row accuracy was reduced to 98 %. It has been observed from Fig. 8 and Fig. 9 that the accuracy of the classication increases with the increases in the number of hidden layers. Similarly, the accuracy of the identication of external faults increases with the increment of hidden      layers from 6 to 10. With further increase of the number of hidden layers, algorithm increases data classication accuracy but simultaneously the convergence time also increases and this slows down the learning process to achieve the target. The performance curves of training, validation and test data for internal fault with 10 hidden layers is shown in Fig. 10. It has been observed that all three curves are similarly formed, this means that the network responds similarly to learning data as well as to the validation and test data by reducing the probability of over-tting [21]. Over trained or over-tting occurs if the validation error increases at the same epoch, the training error slope decreases.
The regression plot representing the regression analysis between the network output and the corresponding target was carried out. Figure 11 shows the regression plot of the In-zone fault with 10 hidden layers. It shows a good t of ANN predicted values to actual output data for training (70 %), testing (15 %), and validation data sets (15 %). The data set model includes all training, testing and validation data sets. In Fig. 11, R' show the regression factor. R represents the slope of linear tting. Output equation used in this method for regression plot is given in Eq. (15).
where, w is the weight, b is the bias.

Support Vector Machine (SVM) Classication Technique
SVM is a statistical technique used for the purpose of computational learning which overcomes the drawback of ANN by giving a global solution rather than a local minima [16]. SVM classiers offer great accuracy and work well with high dimensional space. SVM classiers basically use a subset of training points hence very less memory required in validation. SVM classiers can be used either in single layer as binary classier which has two possible states in-zone faults (+1) and out-of-zone (−1) fault or multi-layer classier, which is a discrete classier that mainly focused on regression problems. The inputs of the SVM classiers provide maximum amount of margin between different class labels. Boundary between the In-zone and Out-zone fault class is known as hyperplane [8]. It is represented by Eq. (16).
where, w is weight vector and b is bias term to determine position of hyper-plane.
The separation distance can be increased by considering minimum value of w. For linear separation, SVM can be realized by support vector as shown in Eq. (17). Labels of the output class are given as shown in Eq. (18) and Eq. (19).
Here, the output function f (x) is equal to '+1' which indicates one class of label (In-zone fault) and '−1' indicates second class of label (Out-of-zone fault). The ow chart of the SVM Classier algorithm is shown in Fig. 12. Cost (C) and Gamma (γ) are hyperparameters, which are set before the training model as given in SVM ow chart. Hyper-parameters are used to control error and also indicate curvature weight of the decision boundary respectively. When C is small, margin will be wide. So, there will be many support vectors and many mis-classied observations. When C is wide, margin will be small. So, there will be less support vectors and less mis-classied values. However low value of cost (C) will give better test data sets performance and also will prevent over tting. Accuracy of support vector machine learning algorithm is shown in Eq. (20) [7] and [20].
% Accuracy = Accurate classified samples Total no of samples · 100. (20) Table 8 indicates the internal and external faults detection and classication accuracy of the test data set which is not the part of the trained data set. Table 9 and Tab. 10 shows the accuracy of internal and external faults identication respectively in Python. Similarly tabulated accuracy of Tab. 8 has been veried in MAT-LAB and Python (SVM training and SVM model t functions) programming.

Wind Farm Impact on Transmission Line Protection
The variation in wind parameter signicantly affects the distance measurement problems in transmission line protection. Fluctuation in wind speed causes variation in voltage level connected to power grid and this leads to the change in impedance measure by protective relays [2]. The impact of a 3-phase short circuit on the transmission line connected with DFIG is more  Total  2400  2383  17 99.29 % It has been noted that the accuracy as obtained by applying SVM for fault classication is highest in the case of in-zone fault i.e. 99.89 %. The accuracy of the proposed algorithm is maximum for L-G fault which majorly occurs in power system. Moreover, the fault classication accuracy of the proposed SVM algorithm is 99.29 % in case of out-of-zone fault. This indicates the greatest security to reject all power system transients occurring outside the line to be protected. critical than Line to ground fault. Wind farms are unstable during dynamical voltage disturbances which is due to induction generator. When a fault occurs or voltage drops, wind generator input power to the grid is decreased and the generator is started to accelerate. If the acceleration is faster than the retrieval voltage, then the rotor speed increases and absorbs more reactive power. If the speed exceeded the set limit, then the whole unit is removed from the system. A Change in positive and zero sequence impedance leads to the overreach and underreach issues in protection strategy. The fault resistance, load angle variation, fault location parameters are equally affected by the performance of the protective relay in wind connected system [28].
The impact of the wind generation is more pronounced for Type III Wind Turbine Generators (WTGs) and especially when the wind park is tapped at the line without installing additional relays at both sides of the connection point. The impact of this penetration varies from a delayed operation to a failure in operation of protective scheme. When WTGs are integrated, the short circuit current contribution is limited by their controllers. Moreover, the induction generator of Type III WTG provides a path for negative sequence current and these results error in distance measurement by relay compared to other Type of WTG. Type III WTG is taken for this study and its impact is included in the data generation according to the wind park type, fault type, fault location and wind generation level.
Above results depicted in Tab. 3, Tab. 4, Tab. 5, Tab. 6, Tab. 7, Tab. 8, Tab. 9, and Tab. 10 that represent the better accuracy of the proposed ANN and SVM algorithm considering impact of WTG in interconnected network.

Conclusion
In this paper, ANN and SVM based internal and external fault discrimination scheme has been tested in the presence of wind generation system. Three phase voltage and current signals are sampled for one full cycle duration of post internal/external faults. The sampled data are given as input to ANN and SVM algorithm for training and train model is used for testing purpose. Feasibility of the proposed scheme has been tested on 4320 test cases, with varying internal and external faults condition. The proposed scheme provides more than 98 % of discrimination accuracy in case of ANN technique with 10 hidden layers using MATLAB development tool. On the other hand, the SVM technique gives more than 99 % accuracy with less convergence time compared to the ANN method. Both of the methods are very simple, fast and accurate to classify faults easily. Following concluding remarks are drawn after executing mentioned machine learning approach for same test data set.
1. Both the algorithms are parametric. In ANN parameters include: number of hidden layers, learning rate, activation function, number of iteration, and the threshold error, while for SVM, parameters include: kernel function gamma and margin parameter C.
2. Both algorithms can work for linear and non-linear functions.
3. ANN and SVM classication approach gives comparable accuracy and reliability depending on the training.
4. ANN-based algorithm involves a trial and error procedure to nd the number of layers, neurons, and activation functions, which makes the overall design process tedious and complex. ANN algorithm is called learning based approach.
5. Disturbance or interruption may occur in electrical power system any time due to intermittent nature of renewable energy sources. The ANN is capable of incorporating dynamic changes of the system. 6. Accuracy of the SVM classiers depend on size of input data structure and optimization of the kernel parameters.
7. The SVM can work well with small training as well as large data set because maximum margin boundary condition will decide the accuracy. Contrary, in the case of the ANN if large or enough data set is not given to the network, then it may result in extremely poor classication.
8. Training algorithm is very fast in SVM compared to ANN. The ANN training process is quite complex for high-dimension problems. The ANN offers slow convergence in the BP algorithm. Convergence is dependent on the selection of the initial value of weight constraints.
9. In the ANN, the initial randomization places the neural network close to local minimum of optimization function while irrespective of initial condition SVM converge to global minima.
It is concluded that the SVM classication technique is more effective, simple and faster compared to the ANN for detection and classication of faults in presence of the wind generating system. Few factors like considering power swing condition and fault classication of series compensated transmission line with wind power generation which are not included in the given studies. It will be included in future investigation with diversied fault scenario to verify the accuracy of fault classication and detection in the presence of multiple terminal lines with renewable sources.