Fault detection, classification and location for transmission lines and distribution systems: a review on the methods

: A comprehensive review on the methods used for fault detection, classification and location in transmission lines and distribution systems is presented in this study. Though the three topics are highly correlated, the authors try to discuss them separately, so that one may have a more logical and comprehensive understanding of the concepts without getting confused. Great significance is also attached to the feature extraction process, without which the majority of the methods may not be implemented properly. Fault detection techniques are discussed on the basis of feature extraction. After the overall concepts and general ideas are presented, representative works as well as new progress in the techniques are covered and discussed in detail. One may find the content of this study helpful as a detailed literature review or a practical technical guidance.


Introduction
Methods for fault detection, classification and location in transmission lines and distribution systems have been intensively studied over the years. With the concepts associated with smart grid attracting growing concern among researchers, the importance of building an intelligent fault monitoring and diagnosis system capable of classifying and locating different types of faults cannot be overstated.
The past 20 years has witnessed the rapid development in various fields concerning the detection, classification and location of faults in power systems. The advances in signal processing techniques, artificial intelligence and machine learning, global positioning system (GPS) and communications have enabled more and more researchers to carry out studies with high breadth and depth in that the limits of traditional fault protection techniques can be stretched. Furthermore, two major restrictions of online fault diagnosis systems are also being solved. The first restriction is the difficulty in data acquisition. In addition to traditional measurement equipment such as potential transformer, current transformer and remote terminal unit, newly developed intelligent electronic devices (IEDs) are being deployed [1] to obtain information at multiple nodes in the grids. Self-powered non-intrusive sensors are also being developed with the potential to form sensor networks for smart online monitoring of smart grids [2,3]. With more data available, researchers are able to develop intelligent fault diagnosis systems through mining knowledge from the data corresponding to different conditions. The effect of complex and varied network configurations can also be eliminated when the current and voltage signals can be collected by interspersed sensors that are plentiful in number. The second restriction is the lack of communication and computation capability. The prospective of GPS-based synchronised sampling and high-speed broadband communications for IEDs in power grids were mentioned in [1]. The application of phasor measurement units has also gained wide attention and a brief introduction of which is found in [4]. These technical improvements can guarantee fast response to faulty situations and the proper functioning of online monitoring systems based on sensor networks. The computational ability of computers has also increased rapidly. High-performance computing solutions such as server clusters are able to complete distributed computing tasks within very short period of time, thus allowing methods with higher computation complexity to be implemented.
In this paper, we present a comprehensive review on the methods used in fault detection, classification and location. A simplified framework for fault detection, classification and location is illustrated in Fig. 1. In the first step, current and voltage signals are sampled and the sampled points are passed to the feature extraction module. This module then extracts features used by the fault detector, the fault classifier and the fault locator. The outputs are the fault type and the fault location provided by the fault classifier and the fault locator, respectively. Some of the works cover all three aspects, while some others focus on one or two of the aspects.

Feature extraction and fault detection
Although the current and voltage signals contain all the information within themselves, it is extremely hard to fit the raw signals into some sets of rules and criterions capable of intelligently interpreting the underlying messages brought by the signals. This is where the feature extracting techniques come in handy, as they dig out useful information purposefully and reduce the impact of variance within the studied system. After proper feature extraction techniques are used, researchers may gain a better awareness of the nature of the fault classification or location problems and thus solve them in a more coherent and efficient manner. Moreover, a reduced dimensionality of the data can sometimes boost the performance of certain algorithms used in the classifiers or locators, providing more accurate and robust results as fast as possible. In this section, methods used for feature extraction are presented together with detailed application examples. At the end of this section, a brief introduction to fault detection methods, which is highly dependent on the feature extraction process, is presented.
dramatically [5], which if detected and analysed properly may help protect the affected transmission lines and distribution systems to a great extent. A variety of methods used to analyse frequency characteristics of time-domain signals have been proposed, and three frequently used methods in fault diagnosis systems are presented as follows.
FT is a widely used mathematical tool when analysis of signals in frequency domain is needed. For applications where time-domain signals and frequency-domain coefficients are both discrete, the transform is referred to as discrete FT (DFT), which can be computed using fast FT (FFT) for fast implementation. In [6], researchers used full cycle DFT and half cycle DFT (HCDFT) to remove DC and harmonic components and estimate phasor elements. Authors in [7][8][9] also adopted HCDFT to calculate fundamental and harmonic phasors used for fault-type classification. Hagh et al. [10] used full cycle FFT to determine fundamental components of currents and voltages.
WT is one of the most used feature extraction methods for various fault diagnosis systems, a comprehensive introduction to which can be found in [11]. In practice, most of the studies use discrete WT (DWT) rather than continuous WT (CWT) to decompose the original current and voltage signals, so that characteristics of the signals in multiple frequency bands can be revealed. Theoretically speaking, the multiresolution decomposition of signals can be done with a filter bank of quadrature mirror filters, and the filter bank decomposes the signal into detail coefficients at multiple levels and approximation coefficients at one level [12]. Thus, when implementing DWT, researchers need to decide which mother wavelet (which decides the properties of the filter bank) and which decomposition levels to use before actually creating the features. A comparison of different mother wavelets for fault detection and classification was provided in [13], which recommended bior3.9, Meyer, coif5, Db10 and Sym8 mother wavelets for fault detection. Note that different sampling rates were adopted, and we should focus mainly on the bounds of frequency bands rather than the level itself. Authors in [14][15][16] simply used the coefficients in detail levels as features. Meyer wavelet was selected in [14], while Db2 was chosen in [15,16], in which the researchers used frequency bands of 1-2 and 4-8 kHz, respectively. In [17][18][19][20], features were extracted by calculating the summations of absolute coefficients of detail levels. The coefficients in 97-195 or 99-199 Hz frequency bands were used in [17][18][19], where three Daubchies wavelets, namely Db8, Db4 and Db1 were chosen. Shaik and Pulipaka [20] used bior2.2 wavelet and coefficients within the 480-960 Hz frequency band. Besides summation of coefficients, maximums of coefficients in detail levels were also used as features. For instance, Pradhan et al. [21] used maximums of coefficients at level 1 and level 2 details (corresponding to 2.5-10 kHz) and approximation as features.
Another way to use the coefficients is to calculate the energies of detail levels. Energy within the frequency band 3840-7680 Hz was used in [22], while 50-100 and 1.5625-3.125 kHz frequency bands were adopted in [23]. On the basis of wavelet energies, the wavelet energy entropy (WEE) introduced in [24] can be calculated, as used in [25]. In addition to WEE, He et al. [26] adopted wavelet singular entropy (WSE), a combination of WT, singular value decomposition and Shannon entropy, to create features. To make better use the information contained in the detail levels, authors in [25,27] used wavelet packed transform (WPT), which not only decomposes approximation coefficients, but also decomposes detail coefficients repeatedly. Thus, the frequency resolution in higher frequency ranges could be greatly improved. In [28], the multiwavelet packet transform, which is based on multi-WT (MWT) and WPT, was used by researchers. The MWT possesses the properties including tight support, orthogonal and symmetrical [29], which when combined with WPT can extract features with higher information density. As a result, extremely high classification accuracy was obtained in [28]. With a variety of mother wavelets adopted and coefficients in both high-and low-frequency detail levels used, the works mentioned above proved the effectiveness of DWT in facilitating the fault classification and location methods.
As a method derived from CWT, ST provides joint timefrequency representation with frequency-dependent resolution based on a moving and scalable localising Gaussian window, as put forward by Stockwell et al. in [30]. The two-dimensional time-frequency representation of ST can effectively reveal local spectral characteristics that are especially useful in detecting and interpreting transient events [31]. Concretely, the calculation result of ST is stored in the S-matrix, on which basis the ST contours can be plotted for a two-dimensional visualisation and features can be further extracted. Some researchers selected ST rather than DWT partially in order to avoid some deficiencies of DWT, such as being sensitive to noise and unable to precisely reflect the characteristics of particular harmonics [32,33]. Samantaray and Dash [34] used standard deviation and energy of ST contour to help select faulty phase and faulty section. Hyperbolic ST (HST) was implemented in [35], where the researchers calculated change of signal energy and standard deviation of ST contour. Variance of the S-matrix and auto-correlation of the absolute value of the S-matrix were also used in [36]. In order to locate the fault point, amplitude and phase angle of the phasors and impedance to the fault point were calculated in [37]. On the basis of the fast discrete ST introduced in [38], Dash et al. [32] proposed the fast frequency filtering ST and calculated the maximum energy among frequency components of each phase to select the faulted phase. Another modification to ST, fast discrete orthonormal ST (FDOST), was used in [33] to obtain magnitudes of negative sequence components and high-level positive components. The authors also calculated the difference between the mean absolute magnitudes of the frequency bands at fault inception and before fault.

Modal transformation
Modal transformations such as Clarke transformation (CT) was used in [39][40][41] to decouple three-phase quantities represented by a, b and c and transform them into components represented by α, β and 0, on the basis of which fault types were characterised by describing the relationships between phase quantities and modal components [41], and fault detection and location indices were calculated [39,40]. Authors in [42,43] adopted a modification of CT called Clarke-Concordia transformation. Karrenbauer transformation, another type of modal transformation, was used in [44] to facilitate the implementation of fault characteristics.

Dimensionality reduction
Principal component analysis (PCA) is useful to reduce the dimensionality of data by mapping the data from the original high-dimensional space onto a low-dimensional subspace in which the variance of the data can be best accounted for [45]. Thukaram et al. [46] used PCA to obtain features from raw current and voltage signals. In [47], researchers applied PCA to the wavelet coefficients and used the principal components for fault classification and location tasks. Cheng et al. [48] proposed a feature extraction method based on random dimensionality reduction projection (RDRP). The measurement matrix used in RDRP to reduce the dimensionality of original input vector is a Gaussian random matrix, making this method independent of the training data. In addition, this method requires small memory space, as the feature extraction process is done with matrix multiplication.

Other methods for feature extraction
Extra computation is needed for the above-mentioned methods to extract features from original current and voltage signals, which adds much computational burden to the monitoring devices. Thus, it is also suggested by some researchers to use sampled points of current and voltage signals within a quarter of a cycle, a third of a cycle, half a cycle or one cycle as features for fault detectors, classifiers and locators [22,[49][50][51][52].
The current and voltage signals can also be used to calculate some quantities as features. In [53], the root mean square (RMS) values of phase currents and zero sequence current were calculated. Khorashadi-Zadeh [54] calculated normalised ratios of maximum absolute values of currents for two different phases first and then used the differences of the normalised ratios as features. Gracia et al. [55] calculated ratios of phase angle differences between phases and the ratio of zero sequence current amplitude to positive sequence current amplitude. Ratios of during fault and pre-fault amplitudes of quantities were used in [56]. In [57], authors calculated superimposed sequence components of current signals and measured apparent impedance of faulted lines.

Fault detection based on extracted features
Generally speaking, the task of fault detection is done prior to fault-type classification and fault location. When an independent method is used for fault detection, the classifier and the locator are triggered after a fault is securely detected. This can be done easily by setting some thresholds for the extracted features. Moreover, in the case where the classifier or the locator is capable of distinguishing between faulty and non-faulty states, there is no need to implement additional fault detection methods. One scheme to perform fault detection in this case is to use an individual classifier to differentiate faulty and non-faulty states. The other scheme is to add the non-faulty state to the output categories, and a fault is detected whenever the output is other than the non-faulty state. Considering the learning abilities of the models used for classification, there is no essential difference between both schemes. Thus, for clarity, we only present here either methods that are used in some special cases or representative methods that are independent of the classification methods discussed in detail in Section 3.
Negative sequence components were calculated in [47] for fault detection. For a more stable detection of faults, the authors designed a joint fault indicator by convoluting the partial differential with respect to time of negative sequence components with a triangular wave, so that the chance of issuing false alarms can be reduced. This fault detection method using the joint fault indicator also shows robustness in cases of frequency deviation and amplitude variation.
In [58], the author proposed a wavelet-based method for real-time fault detection in transmission lines. The border effects of the sliding windows used to obtain the wavelet coefficients used for energy calculation were considered, allowing a shorter detection time than considering the transients alone. The method was not affected by the choice of mother wavelet and had no time delay for fault detection for compact and long wavelets.
A number of studies have been made on the detection of high impedance faults (HIFs) [59][60][61][62], for traditional detection techniques may fail when a HIF occurs. In [59], authors used DWT with quadratic spline mother wavelet to extract high-frequency information for HIF detection. Lai et al. [60] converted scale coefficients and wavelet coefficients obtained by DWT to RMS values to help detect HIF. PCA was applied to mean values of DWT coefficients in different frequency bands to reduce the dimensionality of features in [61]. The method introduced in [58], as presented earlier, was used for HIF detection in [62].
Some works cited in this paper reported the time used for fault detection. For some of these studies, the time lengths needed for fault detection were smaller than 10 ms, which is half a cycle, as summarised in Fig. 2. It is worth noting that some of the methods were even able to detect faults within 2 ms. If we take into consideration the time needed for classification of faults (over 30 ms in many cases), we would see that the difference made by fault detection time on the overall performance of fault classification and location systems is not very significant. Nevertheless, detecting the faults as fast as possible while maintaining high robustness and accuracy is worth the efforts of researchers.

Fault-type classification
Fault-type classification plays a significant role in protection relay for transmission lines and power distribution systems, thus researchers have had constant interest in developing new, robust and accurate fault classification algorithms and models for decades. The majority of the classification methods adopt classifier models based on statistical learning theory [63], while some other works used logic flows based on experience and observation of collected data. It is noteworthy that the development of studies in this particular field has been highly relevant to the development of pattern recognition and machine learning (more specifically, supervised leaning algorithms for classification). In this section, a detailed review of methods for fault-type classification is provided in a developmental and comprehensive point of view.

Fault classification based on logic flow
If no machine learning or artificial intelligence based algorithms are implemented, usually a tree-like logic flow with multiple criterions is used. In [64], authors compared the values of four extracted features for three phases and ground to pre-set thresholds. If any one of the values exceeds its threshold, the corresponding phase (or ground) is involved in the fault. Researchers in [65][66][67] extracted the features using WMA and generated logic flows based on observations of the characteristics of the features. At each node in the logic flow, certain comparisons were made between feature values or between a feature value and a threshold. Authors in  [40,44] adopted modal transformation for feature extraction. CT was used in [40] to produce fault detection indexes for each phase. Thresholds were then added to complete the classification task. In [44], Karrenbauer transformation and WT were used. Modulus maxima of the WT were then fed to the logic flow to decide the fault type. WT and Shannon entropy were used in [26,68] to produce features. In [26], where the authors used the WSE method, logic flows were implemented after the features related to the entropies were calculated.

Artificial neural network
Artificial neural networks (ANNs) are a family of non-linear statistical models and learning algorithms with the intention to imitate behaviours of connected neurons within biological neural systems, which has developed and evolved over a long period of time. Different ANN models have been used for applications in various fields, including fault classification in transmission lines and distribution systems.
Of all the ANN models, one may find a feedforward neural network (FNN) the simplest in configuration, which can be characterised as single-layer or multi-layer perceptrons. A FNN often has an input layer, an output layer and at least one hidden layer. Generally, the nodes (or neurons) in adjacent layers are fully connected, and the parameters (weights assigned for the connections and biases for the nodes) decide the output of the network given an input. To put it simply, the so-said learning process is carried out by adjusting the parameters of the network, so that the output would satisfy certain conditions, such as estimating a function accurately [69]. The application of FNN with back-propagation (BP) in power systems dates back to late-1980s [70] and early-1990s [71], not long after the BP algorithm was first developed in 1986 by Rumelhart et al. [72]. Since then, a considerable number of studies have used FNNs for the task of fault-type classification. As FNN is the simplest type of ANN, they are primarily referred to as ANNs or NNs instead of FNN. Moreover, as the training process of FNNs mainly uses BP algorithm, the term BPNN is also used (though BP is also used in many other ANN models). A two-hidden-layer FNN was used in [73], where the hidden layers have 20 and 15 nodes, respectively. The authors used five consecutive sample points of voltage and current of each phase as the input, and 11 output nodes representing ten fault types and no fault state form the output layer, thus forming a 30-20-15-11 FNN. In [74], authors decomposed the voltage signals into six frequency ranges, and the energy of each range was calculated, creating 18 features for the input layer. The 18-12-3 FNN is capable of achieving fault phase selection. Xu and Chow [75] used FNN to identify two major fault causes in distribution systems, namely tree contact and animal contact. The FNN structure introduced in [22] is 40-30-4, where five sample points of both zero sequence current and voltage signals were added to the input used in [73]. The authors adopted binary coding for the output layer, thus four nodes were used for 11 fault situations (ten fault types and no fault state). In [10], authors employed separate FNN modules to identify different types of faults, so that each network had less patterns to learn. Researchers implemented Clarke's transform and DWT in [41] to produce appropriate input for the FNN. For the dataset used in [41], the 12-24-48-4 structure outperforms the 12-6-12-4 and the 12-12-24-4 structures, for such a structure has better learning ability.
Radial basis function networks (RBFNs) are FNNs that use radial basis function as activation functions for hidden nodes. Typically, a RBFN has one hidden layer, and the activation functions are Gaussian functions whose response to x decreases monotonously as the distance from x to the centroid c increases [76]. In [77,78], Park and Sandberg proved that a RBFN with one hidden layer is capable of approximating any bounded continuous functions. Taking the deficiencies of FNNs with sigmoid activation functions into consideration, researchers used RBFN to build fault-type classifiers with good classification effects. Minimal RBFN (MRBFN) was introduced in [79,80] to systematically decide the minimal number of hidden nodes, reducing both the number of hidden nodes and training time. The linear iterated Kalman filter was used to adjust the parameters of the MRBFN in [79]. Orthogonal least squares algorithm and recursive least squares algorithm were used for the learning procedure in [35,49,81], respectively. In [81], two RBFNs were trained separately to classify faults involving earth and not involving earth, respectively.
Probabilistic neural network (PNN) is another type of FFN that uses exponential activation functions. The PNN structure proposed in [82] by Specht has four layers, namely input layer, pattern layer (hidden layer), summation layer and output layer. The activation function of nodes in the pattern layer is exp(−(w i −x) T (w i −x)/2σ 2 ), where x is the input vector, w i is the weight vector of the ith pattern node and σ is the smooth factor. The summation nodes sum the values given by pattern nodes belonging to the same category and the classification result is given by output layer after comparing the sums calculated by the summation nodes. No iterative training process is required for PNN, as one pattern node is added for each x in the training set, and the weight vector is set equal to x. This indicates that a PNN can be trained in a very fast manner. Furthermore, no retrain is needed when new training samples are added [83]. An early implementation of PNN in power system fault classification is [84], where the authors found that the classification rate of PNN was 10% higher than FNN for the case they studied. In [14,85], researchers used PNNs with features extracted by DWT as the input. Nine pattern nodes were used in [14]. Both studies achieved classification accuracies of near 100%. ST was implemented in [34] to extract features for PNN, and the pattern layer had five pattern nodes. Mirzaei et al. Recently, the Chebyshev neural network (ChNN), which belongs to the functional link neural networks, was used in [27,87] for fault classification in transmission lines. In these studies, the Chebyshev polynomials were used as functional expansion to map the original input into higher-dimensional space, and the hidden layer was thus replaced, leaving only one layer in the network. The researchers compared the classification results obtained by the ChNN, and proved it a very effective method for fault classification in transmission lines with high accuracy. Further, as a ChNN has the single-layer structure, only one parameter is tunable, making it easier to implement than methods such as support vector machine (SVM) and other ANN models.

Support vector machine
SVM was invented by Cortes and Vapnik in 1995 [88], the theoretical foundation of which can be found in [89]. The main idea of SVM classifiers is to find an optimal hyperplane that maximises the margin between two groups of examples. By using non-linear kernel functions which map the examples into higherdimensional spaces, one can obtain non-linear SVM classifiers. The structural-risk-minimising nature of SVM prevents the presence of over-fitting. Moreover, the parameter optimisation process of SVM is a convex optimisation problem, which means falling into local optima can be avoided. The advantages of SVM made it a powerful tool for fault classification in transmission lines and distribution systems. In [51,90], authors used SVM for fault classification in series-compensated transmission lines. Both studies used three separate SVMs for three phases and another SVM for ground fault detection. Polynomial and Gaussian kernels were used, and Gaussian kernel outperformed polynomial kernel in [90]. Authors in [16,[91][92][93] used features extracted by DWT as inputs to SVMs. In [16], apart from one SVM for each phase, an extra SVM was added to distinguish between fault and transient switching conditions. In addition to DWT, authors in [47,94] used PCA to reduce the dimensionality of the wavelet coefficients before sending the coefficients to the SVMs for fault-type classification. Hardware implementation was also done using FPGA and a real-time power system simulator. Researchers also reported SVM classifiers using features extracted by ST [95,96]. Moravej et al. [33] implemented FDOST to obtain useful features and ranked the features by the Gram-Schmidt method for better classification performance. In [97], authors used the one-class quarter-sphere SVM (QSSVM) to detect and classify faults. The QSSVM is able to map the input vectors in a way that the inputs corresponding to normal state are enclosed in the quarter sphere and the inputs of faulty states are kept outside of the quarter sphere. Satisfactory fault detection and classification results were obtained by the temporal-attribute QSSVM and attribute QSSVM, respectively.

Fuzzy inference systems (FISs)
FISs employ fuzzy logic and perform inference operations based on fuzzy if-then rules. On contrary to Boolean logic, fuzzy logic allows the degree of truth to be indicated by values in the range [0, 1], 0 representing absolute falsity and 1 representing absolute truth. A basic FIS can be divided into three stages, namely fuzzification stage, inference stage and defuzzification stage [98]. The membership degrees of inputs for different membership functions are calculated in the fuzzification stage, which are then fed to the inference engine where the if-then rules are used. The defuzzification stage then gives the final decision based on results of inference stage, such as the classification decision for the input. Mahanty and Gupta [57] used sample points of post-fault three-phase currents to calculate characteristic features for the fuzzy rule base. The features were calculated as the differences of normalised ratios of phase current maximums. Other characteristic features include angles between the positive and negative sequence components and ratios of magnitudes of different sequence components, as used in [7,56]. Peak values of coefficients obtained by DWT and features extracted by ST were also used in [21,36], respectively. In [36], the initial fuzzy rule base was developed from a previously trained decision tree (DT), and the rule base was then simplified by using similarity measure and genetic algorithm. In [99], for the task of discriminating faults caused by animal, lightning and tree using imbalanced data, the E-algorithm, which heuristically finds the optimal fuzzy rules, was proposed. Linguistic and descriptive data records were transformed into numerical variables by the likelihood calculation module and passed to the classification module. In [100], authors used the fuzzy-neuro (also referred to as neuro-fuzzy and fuzzy-neural) approach for fault classification. The fuzzy-neuro approach combines ANN and fuzzy logic, so that uncertain knowledge can be properly represented and that learning from examples becomes possible. Zero, positive and negative sequence current components were used as inputs in [100]. In addition to fuzzy-neuro approach, adaptive-network-based FISs (ANFISs) have also been used, the details of which are explained in [101,102]. Concretely, an ANFIS is a five-layer network based on Takagi-Sugeno's fuzzy if-then rules, and a hybrid learning algorithm was adopted to identify the consequent parameters (parameters of the if-then rules) in forward pass by least-squares method and the premise parameters (parameters deciding the shape of the membership functions) in the backward pass by gradient descent [102]. Yeo et al. [53] used four ANFISs for RMS values of three-phase currents as well as zero sequence current, and both low impedance and HIFs were classified correctly. In [103], authors validated the robustness and precision of ANFIS by adding white noise to the test data.

Decision tree
DTs refer to a class of tree-like graphs capable of making decisions. The fundamentals of DT are discussed in [104,105]. Concretely, three types of nodes are included in a DT, namely a root node, internal nodes and leaf nodes. For classification problems, the root node is where the decision-making process begins, and each leaf node represents a class label. Tests are made on the root node and each internal node and the decision-making flow goes along the path that satisfies the test conditions. A suboptimal DT was generated using the training dataset by greedy algorithms (e.g. ID3, C4.5 and classification and regression tree, CART) [105] in reasonable amount of time while the accuracy requirement was satisfied. In order to overcome over-fitting, pruning procedures were performed to reduce the size of the generated tree [105]. Samantaray [52] used sample points of zero sequence current, three-phase currents and voltages to form the input vector for the DT classifier, which outperformed SVM classifiers. In [8,9], first ten odd harmonics up to the 19th harmonic of voltage and current signals were obtained by HCDFT. Random forest (RF) algorithm was then implemented for classification of faults in single-circuit and double-circuit transmission lines. Specifically, a RF consists of a finite number of DTs and a plurality vote among the trees gives the final decision [106]. As it turned out, the decision-making process could be performed accurately in less than a quarter cycle [8,9]. Upendar et al. [19] used DWT coefficients as features for CART DT and compared its performance with FNN. Both methods achieved high accuracy, while CART DT's performance was even better. CART DT was also used in [107], in which the authors extracted a set of differential features by HCDFT from voltage and current signals.

Future outlook for fault classification methods
While the above-mentioned studies in transmission line and distribution system fault classification mainly adopted well-developed machine learning algorithms, a huge number of developments and new trends in the fields of machine learning and data mining are worth noticing. In 2006, Hinton et al. presented the possibility of extracting feature representations from data using restricted Boltzmann machines (RBMs) or autoencoders [108], laying the foundation for deep learning (DL). The structure of a DL model is similar to that of a multi-layer FNN, but the unsupervised feature learning from large amount of unlabelled input data can prevent the model from over-fitting and falling into local optima. The easy access to massive amount of data and high computing ability of machines have made such a pre-training method possible. Recent developments in DL have successfully improved classification ability in many fields [109], and the application of DL in power system fault classification is promising. Methods such as convolutional neural networks (CNNs) are also used to deal with multi-channel sequence recognition problems [110], providing new ideas for the fault classification tasks, where three-phase current and voltage signals are also in the form of multi-channel time sequences.

Fault location
A considerable number of studies have focused on fault location in that accurate location of faults in transmission lines and distribution systems can greatly reduce the time to recovery. A comprehensive review of fault location in power systems is provided in [111]. In [1], where a smart fault location method was proposed, the background knowledge for fault location was also provided. Thus, in this paper, on the basis of existing review studies, we present the fundamentals and some new progress in fault location techniques.
For transmission lines, conventional fault location methods can be divided into impedance focused methods (phasor or time-domain based) and travelling wave based methods. For distribution systems, methods using superimposed components and power quality data may also be considered [1]. Depending on the source of data, fault location methods may be further categorised as single-end methods, double-end methods, multi-end methods and wide-area methods. In this paper, however, we present fault location methods in a different manner as we only focus on some special portions of them. Due to the fast development of wide-area methods and the need of building reliable large-scale smart grids, we take wide-area methods into account. Similarly, we take fault location methods of series-compensated transmission lines and hybrid transmission lines into consideration because of their special properties that distinguish them from normal transmission lines. At the same time, we take modern artificial intelligent methods into account because of their good performance on fault location and broad application prospects. Consequently, the following fault location algorithms to be discussed mainly concentrate on wide-area fault location algorithms, series-compensated transmission lines fault location algorithms, hybrid transmission lines fault location algorithms and artificial intelligence based fault location algorithms.

Wide-area fault location algorithms
Traditional fault location methods fail to locate faults when either of the monitor devices at the terminals of the faulty line fails to record the fault waveform. Wide-area fault location methods are applications of the wide-area monitoring system and they can overcome the adverse situation by providing a viable solution to the fault location problem [112]. In other words, wide-area fault location methods can precisely locate the fault point within the entire large-scale transmission network by using the information provided by a small amount of monitor devices that are dispersed in the network. Authors in [113,114] proposed a non-linear optimisation-based synchronised algorithm. By acquiring the arrival time of voltage travelling waves at different sensor nodes in the network and splitting all the transmission line at virtual bus nodes, a closed-form expression solution was obtained. In [115], multiple synchronised voltage measurements were utilised to model the fault location problem as a non-linear estimation problem, which was solved by applying a novel transform based on pre-fault bus impedance matrix to convert the non-linear problem to a linear weighted least-squares problem. Azizi and Sanaye-Pasand [116] proposed a synchronised voltage-based non-iterative method by taking advantage of the substitution theorem. By replacing the faulted line with a suitably adjusted current source injecting the same amount of transmission line current, an equivalent network was established. The positive-sequence and negative-sequence network impedance matrix constructed based on the pre-fault network topology was utilised to calculate the location of fault using the linear least-squares method. In [117], through building a positive-sequence network, a matching degree factor, which is a function of fault distance and is equal to zero only at the exact fault point, was defined. Concretely, calculating the matching degree factor at every bus that is temporarily assumed to be the faulty bus can point out the fault region. Fault location is then determined by calculating the factor at all the lines included in the fault region by a small step. Similarly, the impedance-based method proposed in [118] locates the fault in a hierarchical manner, by which the faulted zone, faulted line and fault point are located in turn.

Fault location algorithms for series-compensated lines
Series-compensated lines are installed with series capacitors (SCs) and metal oxide varistors (MOVs) to accomplish series compensation. In spite of the favourable performance of series compensation, the presence of SC and MOV causes some difficulties to faulty segment detection for fault location because of their non-linear behaviour. Thus, traditional approaches need improvement so as to fit the fault location task on series-compensated lines [111,119,120]. In [121], an impedance-based algorithm utilising double-end voltage and current signals was proposed. Impedance between the capacitor and the fault point was calculated to obtain the entire fault current. As the angle difference of fault voltage and fault current at the real fault point is minimised, the real fault point can be found by searching the potential fault point along the entire line with small steps. Swetapadma and Yadav [122] used artificial intelligent method to locate the multi-location faults and normal single fault. DWT was used to extract the third level  approximate wavelet coefficients from one pre-fault cycle and two post-fault cycles of voltage and current signals. The features of standard deviation of approximate coefficients of voltage and current signals were then calculated as inputs for an ANN. In [123], a time-domain model of thyristor-controlled SC (TCSC) and distributed transmission line model was built. The method requires synchronised information from two ends of the line, and the transient resistance of the TCSC measured during the first cycle of fault inception can be acquired as a fault section indicator.

Fault location algorithms for hybrid transmission lines
Similar to series-compensated lines, hybrid transmission lines consisting of overhead transmission lines and underground cables have discontinuous points named joint-nodes where reflections of fault signal are generated. Another important property of hybrid transmission line is the difference of travelling wave velocities in line and cable. Therefore, conventional approaches need improvement to be suitable for hybrid transmission lines [124]. A travelling wave based algorithm was proposed in [125], in which the authors used transients caused by opening of circuit breaker instead of using fault-induced transients. The arrival time of modal components of voltage travelling wave was detected by WT, and the fault zone was then judged by the polarity of the reflections. Further, the wave speed in cable section and overhead line was also calculated, after which the fault point was acquired through a normal double-end travelling wave method. In [126], the current signal was passed through two sample FIR filters to remove DC offset. Wavelet detail and approximate coefficients of voltage and current signals were obtained by applying DWT. The first level detail coefficients (800-1600 Hz) were set as inputs to a neuro-fuzzy system to determine whether the fault section was on the overhead transmission line or the underground cable. Then, the third level approximate coefficients (0-200 Hz) were set as inputs to another neuro-fuzzy system to calculate the fault location. In [127], time reversal method was used to locate the fault. After recording the fault-originated transient waveform at an observation point, back-injection at the observation point of the time-reversed measured fault waveform for different guessed fault locations was simulated. By comparing all the fault current energies at a series of guessed fault location, the real fault point with the maximum fault current energy was found out.

Artificial neural network based algorithms
On the basis of the capability of self-learning, self-organisation, fast processing, highly fault tolerance and non-linear function approximation, different kinds of ANNs have also been applied to fault location tasks. In [128], approximate and detail coefficients in the bandwidth of 0-500 Hz were extracted by DWT from three-phase current and voltage signals at one end of a double circuit transmission line, which were then used to train the ANN with Levenberg Marquardt algorithm to locate faults. In [129], the magnitudes of fundamental components of three-phase voltage and current signals were extracted by DFT. Three vectors formed by different combinations of features were used as the inputs to three different modular ANNs. The results showed that the ANN with the features containing both voltage and current information had the best performance with respect to fault location accuracy and training speed. Complex-domain ANNs are simple extensions of standard feedforward real-domain ANNs, whose inputs, outputs, interconnection weights and biases are all complex number. In [130], the fourth level of detail and approximate coefficients were acquired by stationary WT (SWT) using Db2 mother wavelet, and these features were input to a complex-domain ANN to locate faults. Authors used DWT to extract features of first peak time in first scale of faulty buses at 1/4 cycle of positive sequence for post-fault currents in [131], in which the features were used as the input for a PNN. Concerning different circuit structures including single-circuit structure and loop structure, the PNN achieved an average error of 0 km for fault location. In [132], a two-stage fault location algorithm using RBF-based SVM and scaled conjugate gradient (SCALCG)-based ANN was proposed. In the first stage, magnitudes of the fundamental harmonics of the positive sequence voltage and current signals of the faulty phases were input to RBF-based SVM to get an approximate fault area. In the second stage, the SCALCG-based ANN was implemented to output the precise fault location using high-frequency characteristics.

Fuzzy inference system based algorithms
As mentioned in [133], a ANFIS has the non-linear approaching, fault-tolerant and self-learning abilities and is able to automatically refine the preset fuzzy rules. Thus, fault location can also be achieved by ANFISs [134][135][136]. In [134], fifth-level detail coefficients (93.75-187.5 Hz) containing the second and third harmonics of three-phase current signals were extracted by DWT using Db4 mother wavelet. These features were used as the input for ANFIS to locate faults, the efficacy of which was validated through a Monte Carlo simulation, and the maximum error for fault location was 5%. Kamel et al. [135] obtained features of impedances including magnitude and phase information of three-phase voltage and current as inputs for ANFIS, and the maximum error was about 4% for different fault conditions. As mentioned in [136], norm entropy of main frequency coefficients (0-62.5 Hz), harmonic coefficients (62.5-500 Hz) and transient coefficients (500-4000 Hz) acquired by six-level DWT using Db4 as mother wavelet were set as inputs for ten ANFIS regression algorithms trained by BP gradient descent method in combination with the least squares method, and the average error was 0.25%.

Support vector regression (SVR) based algorithms
In addition to fault classification, SVMs can also be applied to regression problems. By replacing the linear terms in the linear equations of SVMs with an alternative loss function named the e-insensitive loss function, the SVMs are able to solve regression tasks. Such a technique is called SVR. SVR retains the properties of SVMs such as using structural minimisation principles to choose discriminative functions so as to reduce the possibility of over-fitting data. It is also trained as a convex optimisation problem so that a global solution can be found [137,138]. In the proposed method in [139], current and voltage signals were denoised and the decaying DC offset was filtered out by SWT. A special determinant function transform was used to extract features from level 2-5 SWT detail coefficients taken over 1/4 of a cycle. After the fault type was classified by a SVM, the features were sent to the radial basis kernel SVR corresponding to the fault type. In [95], the authors used HST, which was implemented by replacing the Gaussian window of ST by the hyperbolic window as an asymmetrical window to extract features from current and voltage signals. Eleven different kinds of features obtained from the HST-matrix were used as the input of the corresponding SVR. In [140], wavelet packet decomposition (WPD) with Db1 mother wavelet was used to extract distinctive fault features from 1/2 cycle of post-fault voltage signals after noises were eliminated by a low-pass filter. The eight-element features of sub-band energies of WPD level-nodes were then passed on to the SVR.

Future outlook for fault location methods
With the development of large-scale smart grid, complex networks with insufficient measurement points are expected to become more and more common, providing the wide-area methods with great promise to be widely implemented in the future. Moreover, compared with conventional impedance based or travelling wave based fault location methods, machine learning based fault location methods have better adaptability and are less likely to be influenced by parameters of lines or fault parameters. With the ever increasing computation and communication abilities, machine learning based fault location will play a more significant role among methods for fault location. Similar to fault classification, advanced machine learning methods such as DL may have a better performance than the methods used currently. Thus, the machine learning algorithms including DL methods may be considered for future research in fault location.

Conclusions
This paper presents a review on the methods used for fault detection, fault classification and fault location in transmission lines and distribution systems. A variety of methods are introduced and representative works are presented in detail.
Prior to introducing the methods directly used in the three topics, we first give an overall review on the methods used for feature extraction, which lays the foundation for other fault diagnosis tasks. Different types of transforms as well as dimensionality reduction methods are presented. We can see that information across the low-frequency ranges and high-frequency ranges is fully exploited, and researchers are more purposeful when choosing the feature extraction techniques as well as selecting the extracted features. Fault detection is presented on top of the feature extraction methods, as the detection techniques are highly dependent on feature extraction. Still, some noteworthy aspects and newly developed ideas regarding to fault detection are presented. A brief summary of fault detection time in the literature is also provided.
For the fault classification task, we mainly put forward various machine learning algorithms that have been intensively implemented by researchers. In addition to the classical models such as ANN and SVM, we also present some promising new models emerged lately. Based on the fact that the development of fault classification methods is highly relevant to the progresses made in artificial intelligence and machine learning, we propose the possible trend for future works, including the application of models such as RBM and CNN.
As surveys of fault location methods can be found in existing literature, we mainly present some fault location methods under several topics that are of interest, including complex line conditions and important artificial intelligence based methods. We also put forward the possibility of using the latest machine learning models to facilitate the fault location tasks.