A novel graph search and machine learning method to detect and locate high impedance fault zone in distribution system

High impedance fault (HIF) is difficult to detect by conventional overcurrent protection relays due to the lower fault current values, which are normally lower than the normal current. A fast and reliable algorithm is required to detect this type of fault. This paper proposes a novel method for detecting the location of HIF fault zone in a distribution system by using a novel graph theory‐based zone detection technique along with a Random Search Multilevel Support Vector Machine (RSMSVM) algorithm to classify the faulted zone. Due to shift in‐variance property of “Dual Tree Complex Wavelet Transform (DTCWT),” which has been used, in this paper, to decompose the voltage/current waveform to collect the signature of the signals and feed to the optimized RSMSVM model for classifying fault zone. The proposed method is evaluated on the IEEE 33‐bus system and also IEEE 39 bus test system under normal and noisy conditions. The proposed method is also evaluated for distribution network with the integration of distributed generation.


INTRODUCTION
The recent developments in the signal processing have provided the different smart methods for fault detection and classification in the distribution systems. The existed methods fail to detect high impedance fault (HIF) as the fault current is low or very close to the load current. HIF occurs when a conductor in a distribution network breaks and comes into contact with the ground or lean and comes into contact with the tree surface. As a result, it leads to a very severe accident if not detected properly. 1 Extensive research works are going on to detect HIF faults and the majority of research works concentrated on the development of sensible fault detector to identify such faults. Numerous methods are proposed to detect HIFs. 2 Lukewarm research work started on detecting HIF is in the early 1980s until 1990, Huang et al. 3 proposed a method to detect HIF based on staged fault test.
Vigorous research works are initiated in 1990s. Emanuel et al. 4 carried out an experimental laboratory work to understand the behavior of HIF arcing on sandy soil in 15 kV distribution feeders and developed a detection of fault by considering current harmonics. Current and voltage measurements play a vital role in detecting HIF. Both current and voltage signals can be inspected through various signal processing techniques to identify the fault and its location. Mamishev et al. 5 proposed a method to detect HIF using fractal techniques, but it is not an effective method due to low data sets for estimating the fault. Among many methods, feature extraction-based voltage and current signals using the artificial intelligence based classifier are most successful. The  A time-domain mathematical morphology is proposed by Gautam and Brahma 6 to analyze the irregular HIF waveform using time-domain analysis. Kavi et al. 7 designed a fault detector to detect HIF in the distribution system by using time-domain mathematical morphology technique. Instead of its simplicity in analysis, time-domain analysis techniques are short of frequency domain features that effect on accuracy in detecting the fault. Frequency domain again classified into low frequency and high frequency-based techniques. In these techniques, voltages and third harmonic current's frequency components are examined for the HIF waveform analysis. Fast Fourier Transform (FFT) based feature extraction is mainly used to extract the high frequency components in the frequency domain HIF analysis. Time-frequency domain analysis estimates the energy of each signal at every point and frequency coordinates. It has its own advantages like coherent time-frequency support, time-frequency localization and features with the high ability of interpretation.
Samantaray et al. 8 proposed a time frequency transform based technique to detect HIF in distribution system taking Probabilistic Neural Network (PNN) based pattern recognition technique. Although time-frequency domain has advantages, it requires more computation to analyze compared to other domains. Time-scale domain analysis extracts both time and frequency features of the fault signal. Mostly Wavelet Transform (WT) based techniques are fall under these. Silva et al. 9 presented a WT based algorithm to detect HIF in a distribution system. WT and evolving network-based techniques are compared with other existed techniques like Support Vector Machine (SVM), PNN, and Multi-Layer Perceptron (MLP).
Souza et al. 10 proposed a Discrete Wavelet Transform (DWT) feature extraction based HIF waveform analysis for electrical distribution system. DWT based detection and transient power direction based HIF location identification for MV networks is discussed in Reference 11. More research works on WT based HIF detection are discussed in References 12,13. Ledesma et al. proposed a method to locate HIF by using neural networks and it is discussed in Reference 14. Time-domain and frequency-domain combination algorithms are discussed in Reference 15, whereas time-scale and frequency domain algorithm is given in Reference 16. Most research works discussed use either artificial intelligent based classifiers or machine learning based classifiers for pattern recognition. In Reference 17, HIF diagnosis is presented in underwater cables with mesh topology. A method based on WT to detect the HIF by using power spectral density is proposed in Reference 18. In recent days, many researchers proposed new methodologies to detect the HIFs that occur in distribution system. Wei et al. proposed a new method to detect HIF in distribution system using distortion based algorithm and it is more discussed in Reference 19. Gu et al. designed an enhanced feeder terminal unit to detect HIFs in overhead distribution line and it is discussed in Reference 20. Artificial Neural Network based HIF location Identification method is discussed in Reference 14. Dubey and Jena proposed a method to detect low impedance faults and HIFs in microgrid by using impedance calculations and it is discussed in Reference 21. Parameter determination based method to detect high impedance arc faults is discussed in Reference 22. A four stage One-Dimensional Variational Prototyping-Encoder based method to detect HIF in distribution system is discussed in Reference 23. A theoretical based study based method is proposed to analyze non-linear characteristics of HIFs and it is discussed in Reference 24. A new method based on piecewise linear fitting technique to solve state equations for detecting arc HIFs and it is discussed in Reference 25. Wang et al. proposed a new method to detect HIF in distribution network based on stochastic resonance and with combination of variational mode decomposition method and it is discussed in Reference 15. An empirical WT based detection of HIF is discussed in Reference 26. But zone identification technique has not been incorporated in these works.
As Dual Tree Complex Wavelet Transform (DTCWT) can solve the problems of shift variance and low directional selectivity in two and higher dimension under noisy condition. In this paper, DTCWT is used for signal analysis over discrete WT. Genetic Algorithm based optimization is used to locate measuring devices at optimal locations. Genetic Algorithm is widely used optimization technique to solve variety of problems and very efficient in performing Machine Learning Tasks. 27 The effectiveness of the proposed method is tested on IEEE 33 bus and IEEE 39 bus test system under normal and noisy condition. The results are compared with different multilevel SVM and found that the proposed method with optimum sample frequency is the most accurate for HIF detection. The Proposed Methodology is designed and developed under the standards of SEL-751 feeder protection relay. In SEL-751 commercial feeder relay detection of HIF is additional feature, which is not integral part of relay. An Arc Sense Technology (AST) which is based upon sum of Difference Currents (SDI) Decision method is used to monitor the HIFs. 28 This paper contributes a new novel two stage algorithm to detect, classify and locate the HIFs that occur in distribution system. It also contributes a new zone protection scheme, which is based on graph search method. The proposed algorithm is tested on real time distribution system 10 generator IEEE 39 bus test system with experimental arc parameters. The proposed method accurately locate the fault zone on multi configured distribution system. In this paper, choice of sampling frequency is also introduced to ensure the high accuracy in detecting and locating the faults. This paper contributes a new method to locate faults in both balanced and large un balanced distribution networks. The proposed algorithm decreases the computational burden of combination of signal processing and data mining method through various techniques like choice of sampling frequency, reducing the level of decomposition and data cleaning by entropy measurement method. This paper contributes hyper tuned SVM for classifying the fault zone and results also compared with normal SVM machine learning algorithm.

PROPOSED METHODOLOGY
The proposed algorithm of detecting, locating, and classifying HIF is based on pattern classification technique. SVM based Machine Learning Algorithm is used to perform pattern classification task in this method. In this method, data acquisition is the first major task. In order to achieve that measuring devices are placed at the optimal locations. In this paper, measuring devices, that is, smart meters are mounted on electrical pole at optimal locations to send and receive voltage, current signal data to main substation. TCP/IP communication protocol is used to communicate data in two ways. 29 Fault zone is the small location where fault is occurred. It is essential to isolate the healthier zone from fault zone during fault to ensure continues power supply. The pattern classification is elucidated as allocating an object or event to one of several classes based on the features derived to recognize the common qualities between the data. Pattern classification involves three steps: (1) measuring the basic quantities like current, voltage from the instrumental transformers; (2) extracting the basic features from the acquired data, and (3) classifying the data through suitable classifiers. In this paper, DTCWT is used to decompose voltage and current signals. 30 WT suffers with some disadvantages like shift sensitivity, poor directionality and lack of phase information. These disadvantages effect on the performance of the algorithm. DTCWT is improvised form of DWT. Entropy measurement based feature extraction method is used in this method. These extracted features are given to SVM for performing pattern classification. In addition, of detecting the fault, the proposed algorithm can also classify non HIF and identify the location of fault. A genetic algorithm-graph theory based zone protection scheme is proposed to achieve the fault identification in the distribution system. The proposed algorithm mainly consists two stages. In the first stage, the pre fault data and post data is collected from the optimally placed measuring devices, these data are processed in DTCWT. The required features are selected through entropy calculation and decision rules are made to detect and classify HIFs from non HIF. In the second stage, data are labeled as three zones namely Zone 1, Zone 2, and Zone 3 by graph search method. These fault zones are identified through Random Search Support Vector Machine Classifier. The flowchart of proposed methodology is shown in Figure 1. The real-time working process of proposed methodology is shown in Figure 2.

Coefficient and entropy calculation using DTCWT
The DTCWT consists two wavelet trees; the first wavelet tree gives the real part of the transform while the secondary part gives the imaginary part. Let (k) is a two-dimensional complex wavelet, the real valued wavelet is denoted by (t) and imaginary valued wavelet is denoted by (t). Then the complex wavelet can be written as

F I G U R E 2
Real-time working of proposed methodology in distribution system with graphical scenario The approximate and detail coefficients of x(t) signal is derived 31 as where, (h) and (h) are scaling factors. It should be noted that first stage of decomposition of signal needs one type of filters and later stages need another type of filters for signal decomposition. The design of Q shift filters is based on choosing the good even-length low pass filters. The low pass filter h L (Z) of length 2n with delay (approximately 1/4th sample) is designed with linear phase low pass filter h L2 (Z) of length 4n as: where, h L2 has half the desired bandwidth and twice the desired delay. The filters after the first level of decomposition are derived as Equations (3)- (6) have applied to voltage signals to capture the coefficients which are input to this proposed method.

Entropy measurement based feature selection
There are many unwanted data in the decomposed pre-fault and post-fault signals data. Feature selection is the method to reduce the irrelevant data in the raw data. In this paper, entropy measurement based feature selection method is used to select the correct data for further process in the algorithm. Entropy measurement based feature selection is suitable to eliminate noisy and unnecessary data. 32 An entropy consists of information regarding uncertainty of the signal and amount of signal. Therefore, entropy measurement gives information regarding defect in signals. Entropy measurement process starts with calculation of wavelet energy. The detail coefficients the energy entropy can reflect the characteristics of arc voltage and the formula of energy entropy (E i ) is stated as where, C i (t) are the detail coefficients of the arc voltage extracted by DTCWT, "i" is the number of extraction levels. Now the distribution of energy is defined as the ratio of sum of energies and energy of the sub-band signal. It is defined as The spectral entropy of a signal based on wavelet theory is mathematically denoted as In order to scale the entropy data and organize in structured way, entropy values are scaled from 0 to 1 range. This process is known as normalization. Normalization is done by dividing spectral entropy with logarithm of N, where N is number of frequency points or half of the length of the time series.

Genetic algorithm and graph theory based zone selection method
The objective function of Measurement Devices Optimal Placement Problem (OPP) is given as where, D is a connectivity matrix, n is number of buses. The matrix D is represented as in the form of where, B is a column matrix and it is represented as where, n is the number of buses in the system and X is the number of measuring devices. X i = 1, if a measuring device is placed at bus i, and X i = 0 in other cases. The objective function will be zero when it is completely observable. In order to calculate the fitness function, the observability analysis should be carried out. Where, N md = number of measuring devices and N h = number of observable's. The value of a, b, and c are taken as 1, 2, and 1, respectively. In this OPP, initial population is considered is 200. The parameter values are taken in this optimization problem is mentioned below. Size of the population = 200, Crossover Operator = uniform, Parent Selection = Roulette Wheel method. The assumed fitness function undergone optimization and converged to give an optimal solution. These optimal solutions are shown in Table 1.
By considering these optimal measuring devices location, zone protection scheme is designed. Out of three sets of optimal solutions, set one is selected for the further process. A novel zone protection method for distribution network is proposed based on graph theory approach to simplify the complex power system. Microprocessor-based relays at fault detector are used. 33 In this paper, graph theory topologies like Vertex (V) and Edges (E) are used for the power system topology. The set of rules considered for the zone separation are as follows: Rule 1: If an initial bus brought together with the current protection zone with a vertex will be assembled to form a new protection zone. Rule 2: If the protection zone contains any same buses, one should keep as zone and other clone zones should be eliminated.
These two rules are considered for search problem, to search protection zone considering the optimally placed measuring devices. From the above two rules a novel protection schemes were proposed. The following steps of algorithm is tested on IEEE33 bus system, and they are illustrated in  Step 1: In the first step every basic bus is treated as initial zone. For example, bus 1 in the IEEE33 test system itself is a zone. This step is diagrammatically shown in Figure 3.

Set number
Step 1 in protection zone search technique

F I G U R E 4
Step 2 in protection zone search technique

F I G U R E 5
Step 3 in protection zone search technique Step 2: In this step, the initial bus searches for the adjacent buses and combines all the buses near to it to form a new zone. This step is shown in Figure 4.
Step 3: In this step search rules compare the existing zone with the new zone formed from the Step 2. If the zone consists any similar buses, those zone will be eliminated to make sure all the buses are protected uniformly. It is shown in Figure 5.
Step 4: In this step search method checks whether all the buses in the zone is equal to the number of buses present in the network, then it completes the search process. If not, go to Step 2. From the graph theory-based search method, three zones are selected, which are tabulated in Table 2.

Classification of fault zone by using multi-level random search SVM method
SVM is a learning machine classifier for solving pattern classification problems. In SVM data is divided into two classes, that is, positive class and negative class which are placed in a spherical Gaussian surface for training. In SVM, post fault data are grouped as positive class and pre fault data are mapped as negative class. The optimization criterion in SVM in training stage is the margin between the training samples data, that is, looking for a decision boundary (hyper plane) with the largest margin between the training data. This margin can be defined as the distance to the nearest samples. These samples are called "support vectors." Hyper plane separates the two classes in spherical Gaussian surface to achieve the data classification. Hyper tuning of parameters increases the performance compared to the normal multilevel SVM. In Gaussian kernel SVM, it is necessary to select a regularization penalty C, which controls the margin and the bandwidth for the training. The problem of picking a good value for hyper-parameters to minimize the generalization error is called the problem of hyper-parameter optimization. 34 The optimal value will be treated as * = arg min meanL(X; i(x(train)))

HIF MODELING
A simplified HIF model is fed to 12.66 kV IEEE 33-bus radial distribution system as shown in Figure 6. The current levels in various cases vary from zero to 75 A. In initial days, linear HIF models are considered to evaluate the fault currents. Most linear HIF models neglected to behave the asymmetric property in fault currents. This lead difficult to feeder protection relays in identifying the difference between load currents and fault currents. Later diode based HIF models are developed to make asymmetric V-I characteristic loop shape of the HIF currents. A HIF model is said to be realistic when the model shows some basic properties like low current values, non-linearity in V-I characteristics and presence of electric arc. Many researchers developed electric arc based HIF models. Cassie and Mayr models are one among them. These models are dynamic arc models and they are developed by thermal principles. Cassie arc model works better for the high fault current conditions. It is most suitable for the low impedance arc faults and high current conditions. Cassie arc model works inaccurate in detecting the zero currents and lower current values. Mayr arc models works for better for the low current fault conditions. It is most suitable for the high impedance arc faults and low current condition. Mayr arc model fails to provide the details of higher current conditions. This made the Mayr arc model not suitable as feeder protection relay. Although many models are developed by combining Cassie-Mayr models to make effective in detecting both low impedance and high impedance arc faults. These Cassie-

Choice of sampling frequency
Proper selection of sampling frequency, which is an integral part of the proposed methodology, reduces the computational complexity in capturing the signature of a signal. It is observed that up to 17th order harmonics, it is good for accurate detection. The Decomposed Post Fault Current signal at level 2, level 3 with 17th order harmonic FFT analysis is shown in Figure 9. Utilizing the lower sampling frequency is enough for accurate signal detection by means of DTCWT. 35 Sampling frequency calculate the speed of taking discrete samples s. Mathematically, it is multiplication of sampled function s(t) by the sequence pulses Ω(t) and it is given in Equation (14).
where, S(t) is the voltage signal extracted. The average error cp is calculated and shown in Equation (15).
The average error cp is calculated and shown in Equation (15). In Table 3, the average error is calculated for different sampling frequencies at different levels of decomposition keeping the limit of 2N as consideration, where N is number of samples. By using (Equation 14) and (Equation 15), it has observed from Table 3 that at 1024 sampling frequency, the proposed methodology shows the highest accuracy. In this method 1024 numbers of samples are collected for 10 number of cycles. The selection of accurate sampling frequency is also shown in Figure 10. DTCWT lacks with high computation time due to its two wavelet trees compared with ordinary DWT. 36 In order to reduce the computational time, number of decomposition levels are reduced to 2 level, and found there is no much effect on the accuracy of proposed methodology and it is shown in Table 3. The modeled HIF is injected in to IEEE 33-bus radial distribution network test system to validate the proposed method-

F I G U R E 10 Graphical representation accurate sampling frequency
RSSVM technique is tested for the non-fault condition by, extracted features at time 0.30-0.50 s at bus 6. The data of non-fault voltage signals collected across all the measuring devices are divided into variables X 1 and X 2 . X 1 variables are the DTCWT detail coefficients of fault signal at fault zone and X 2 variables are the DTCWT detail coefficients of non-fault signals at healthier zones. When any fault occurs in the distribution system, the fault information mapped into the arc voltage. The energy of each frequency component will change accordingly to the change occurred in the system. Therefore, after performing the DTCWT feature extraction for the arc voltage, the energy of every sub-band is calculated. This energy sub-band is called as energy entropy. Fault signal energy entropy is compared with the normal signal energy entropy. The signals, which are violating the threshold value, are treated as fault signals. Fault signal energy is 5 times of ordinary signal energy. These violated signals are treated as fault signals. In Figure 11, the scatter diagram of axis X 1 and X 2 is drawn taking both positive and negative class as non-fault signals. Positive class data sets are represented in blue color and negative class data sets are represented in red color. The data sets are trained in multi class SVM pattern recognition technique program in MATLAB. It is learned that all the data sets are started getting converge each other at a particular area, which results no classification as all data sets belongs to the non-fault signals. The various fault cases are discussed.

Case 1: Fault between bus number 6 and 26
In this case, HIF is applied to IEEE 33-bus test system between buses 6 and 26. The fault voltage and current signals are captured by measuring devices which are located near to fault occurred zone at buses 5, 26, and 30. The arc voltage is observed in Figure 12. In the Figure 12, the voltage looks normal at the substation side but there is a slight deviation in the spike in the signal. The arc parameters in this case are considered to be V P = V N = 5600 V, R P = 800 Ω, R N = 150 Ω, which form an asymmetric V-I curve. The Fault Current is shown in Figure 13. In Figure 13, It is observed that the fault current value is lesser than the normal value. The fault signal is analyzed by the DTCWT with 3rd order and extracted features are selected for the next process. Similarly, the extracted features of non-fault zone data sets from the other measuring devices are also collected according to proposed methodology.  Table 4.
These data sets are now fed to the SVM for the data classification. All fault signals are treated as positive class, and they are represented in blue color and all non-fault signals are treated as negative class and represented in red color in scatter diagram as shown Figure 14. In the scatter diagram, the majority of blue data sets representing the fault signals are accumulated at particular area. These data sets are analyzed by the confusion matrix to get the classification performance. Confusion matrix is a table that describes the performance of the classification problem. In this case, 532 data sets are collected from the 11 fault detectors placed at different locations in the test system. The confusion matrix for the case 1 fault is shown in Table 5. From Table 5, it is revealed that out of 260 fault samples, 230 samples are grouped in zone 2 in the confusion matrix with 0.94 precision value. The overall efficiency of the classification problem observed is 91.54%. The overall accuracy is improved by hyper tuning the SVM parameters. In this case grid search SVM yields 98.87% overall accuracy and it is shown in Table 6 and Random Search SVM yields 99.81% Overall Classification Accuracy (CA) with perfect 1.0 precision value and it is shown in Table 7. The error of locating fault zone is less than 1.19%. Here the fault inception angle considered to be 90 • (Va 90 ) and the results at fault inception angle 0 • also performing nearby result, in spite slight rise in the sum of details of decomposed voltage signal.

Case: 2 Fault between bus number 23 and 25
In this case, the data sets of fault signal are collected at the measuring devices located buses 22 and 24. The other non-fault data sets are collected at the other measuring devices located at different locations in the test system. SVM based machine learning technique is implemented for the classification problem. Radial Basis Function (RBF) is used as kernel function to differentiate the non-separable data. The entire fault signal is treated as positive class and represented in blue color whereas the entire non-fault signal is treated as negative class and represented in red color. A total of 532 data sets are collected at fault location and other locations. The scatter diagram of axis X 1 and axis X 2 is shown in Figure 15. The confusion matrix for case 2 is shown in Table 8.
In normal SVM, it is found that out of 260 fault signal data sets, 238 are predicted as they belong to zone 1 with 91.5 efficiency. The overall efficiency of the classification problem is 90.03%. In this case grid search SVM yields 97.35% overall accuracy and Random Search SVM yields 99.81% Overall Classification Accuracy (CA) and it is shown in Table 8. The error of locating fault zone is less than 1.19%. In this case, the fault is occurred between buses 5 and 16 in the presence of 35 dB noise. The fault signal is analyzed by DTCWT signal processing technique and fault data sets are collected at fault detectors, which are located at buses 14 and 17. The detail coefficients extracted from the fault signal at levels 4 and 5. The non-fault data sets are collected from the measuring devices, which are placed at different locations in the test system. Multi-class SVM classifier is used for classification problem. A total of 527 data sets are trained through SVM program. The trained data sets are analyzed by the confusion matrix; it is shown in Table 9. The algorithm is competent enough to identify the HIF and proper zone in presence of noise.
In regular SVM, out of 260 fault signal data sets 236 signals are predicted yes as they belong to zone 3 with 90.76% efficiency. The overall efficiency of the classification problem is 90.32%. In this case grid search SVM yields 97.22% overall accuracy and Random Search SVM yields 99.81% Overall Classification Accuracy (CA) is shown in Table 9. The error of locating fault zone is less than 1.19%. Current and voltage signals in practical applications are always distorted with noise. Noise, often known as interference, is described as unwanted electrical signals that modify or collide with the source F I G U R E 16 HIF voltage with noise conditions SNR = 25 dB and SNR = 10 dB signal. As a result, the usefulness of the suggested approach for detecting, classifying and HIF zone has been examined in a noisy environment as well. The noise in distribution system may be found throughout the time series of the signal and has a regular probability distribution. The influence of noise should be examined thoroughly to examine the reliability of proposed methodology. The noise is mathematically denoted by the signal-to-noise ratio (SNR) and it is formulated as High noisy condition is introduced to arc voltage and proposed methodology is tested. About 25 and 10 dB SNR conditions are considered in analyzing the HIF signal. The HIF fault voltage under noise condition is shown in Figure 16.
In high noisy condition, 25 dB SNR the accuracy of proposed algorithm is slightly reduced to 98%. The Proposed methodology accurately detects the fault zone with 92.10% overall classification accuracy under 10 dB SNR high noisy condition. Moreover, the accuracy is more than 90% and it ensures the relay based on proposed methodology can trip during fault condition. 4 Figure 17. In the Figure 17, it shows that distributed generation sources are injected at bus 18, 22, 25, and 33. Reactive power compensators are also provided. The distributed generations are connected with voltage source converters, and they are controlled by traditional droop control method. The radial bus system made meshed through stitching buses 25 and 29, buses 8 and 21, buses 12 and 22 and it is represented through dotted line. In this case, HIF is treated to be occurred at buses between 23 and 25 buses. A total of 532 data sets are considered for zone classification problem. Random search hyper tuned SVM is used as classifier to classify the fault zone. A total of 260 data sets are collected from the fault zone measuring devices through graph theory and genetic algorithm based search method as proposed in this paper. The confusion matrix of case 4 is tabulated in Table 10.
In the Table 10, out of 260 fault data sets 259 are classified as fault data sets and grouped in to zone 1. This concludes that fault occurred at zone 1 and it is isolated from the health zone in the distribution system. The overall efficiency of the classification problem is 99.624%. 4 Figure 18. In the first stage of proposed algorithm, the measuring devices are placed optimally through the proposed graph theory and genetic algorithm based search method to collect the data. The protection zone is classified in to three zones, and they are tabulated in Table 11. In this case, fault is occurred between bus number 11 and 12. The voltage and current signals at pre fault and post fault are collected from optimally placed measuring devices. The data is transmitted through transmission control protocol (TCP) using the IEEE C37.118 format. The programming is written in python language, which allows the data to receive, send and store the data in two-way communication. The data is further processed in DTCWT; the decomposed coefficients are subjected to filtered through entropy measurement based feature selection method.

F I G U R E 18
The entropy of the decomposed signals is measured. The healthier signal entropy is measured as 0.537, HIF entropy value is measured as 0.595 and non HIF faults entropy value is measured to be 0.6-0.85. The decision rule successfully detected the fault value and classified the type of fault. These data are further processed to the second stage to identify the fault zone. The filter data sets of zone 1 are 260, out of 260 data sets 259 data sets successfully classified as zone 1. The proposed methodology identified the fault zone in the second stage. The overall classification accuracy of the classification problem is 97.56%. The overall classification accuracy is slightly decreased when the proposed methodology tested on the real time distribution system. Still, the proposed methodology yielded satisfactory performance in detecting, classifying and identifying the fault location in distribution network.

F I G U R E 19 HIF location identification methods comparison
As it is discussed earlier, multilevel SVM accuracy is improved by hyper tuning the parameters. In this paper, random search method and grid search method are used to hyper tune the parameters to increase the performance of the proposed algorithm. The comparison table for case: 1 of normal gradient search SVM with RBF kernel with random search and grid search hyper tuned SVM is given in Table 12.
It tells that SVM random search based proposed methodology is outperformed with normal multilevel SVM and grid search SVM. The proposed methodology is compared with other existing methods that are available in literature. The comparison is given in Table 13. The existing HIF location methods are compared with the proposed methodology, and it is shown in Figure 19. From Figure 19, it is clearly shows that the proposed methodology is outperformed the other existing methods present in literature. In Table 13, it is also observed that Deep Learning method and Stochastic Resonance method accuracy is 100%, but both algorithms are restricted to detection and classification of HIFs in distribution network.

CONCLUSION
In this paper a novel graph theory and machine learning based HIF detection, classification as well zone identification method has been proposed for distribution system. Due to shift invariance property of DTCWT, it has been used for signal decomposition. Entropy Measurement based method is used to extract the selected features from the decomposed signals. Decision rules have been concluded from the entropy measurement to detect and classify HIFs. Here Random Search Multi Support Vector Machine algorithm is used to classify the faulted zone. The proposed methodology is designed and developed under the standards of a commercial relay SEL-751 feeder protection system. The proposed method accurately locates the faulted zone although multi configuration changes in the distribution network. The proposed method also collects data from optimally placed measuring devices, this makes the proposed methodology cost effective. The method of selecting sampling frequency makes this methodology more accurate. Here both balanced and unbalanced networks under noisy condition has been considered. This makes the proposed methodology approach towards realistic distribution system. The proposed methodology is effective at high noisy condition and low noise condition; this shows the robustness of the algorithm. In this paper authors have shown that computational complexity can be reduced by selecting the perfect sampling frequency along with the number of levels in decomposition, which makes the algorithm faster. The proposed method has been tested on radial balanced IEEE 33 bus test system, unbalanced modified IEEE 33 bus test system and IEEE 39 bus test system. It is also applied on a real time system. In each and every case studies, this proposed methodology shows high accuracy. So this technology can be used for the real time distribution system for detecting, classifying and locating HIFs.

DATA AVAILABILITY STATEMENT
Data is available.