Fault classification and location identification in a smart DN using ANN and AMI with real-time data

: This paper presents a real-time fault classification and location identification method for a smart distribution network (DN) using artificial neural networks (ANNs) and advanced metering infrastructure (AMI). It also describes the development of a testbed for real-time testing of the proposed approach. The testbed consists of a simulated power system model [running on a digital real-time simulator (DRTS)] and AMI. The core parts of AMI are smart meters (SMs), a communication network (developed using DNP3 protocol over transfer control protocol/Internet protocol), data concentrator (DC), and a Utility Operations Centre (UOC). Event-driven data from SMs are collected in the DC and then fed to the UOC for being used as inputs for the novel ANN-based fault classification and location identification algorithm. On the basis of the data received, the algorithm can classify the fault type and locate it with high accuracy. Both balanced and unbalanced fault types are tested on different nodes and lines throughout a DN modelled in offline and on the DRTS. A comprehensive sensitivity analysis is performed to validate the effectiveness of the proposed method. Classification accuracy of over 99% is achieved when classifying all fault types, and above 95% accuracy is achieved when identifying the fault location.


Introduction
According to the US Department of Energy, US market loses around 27 billion dollars per year due to power outages. More than 60% of electric power outages are due to the faults on distribution networks (DNs) [1].
In comparison to transmission systems, DN consists of several branches and laterals, which spans from small lines of urban areas to large lines in rural areas. These lines are vulnerable to different types of electrical faults emerging from a variety of causes such as extreme weather conditions, equipment malfunction, fallen trees, and animal contacts. Power outages resulting from these faults can last from a couple of hours into days if not identified accurately. In this context, fault location (FL) identification offers significant benefits for electric utilities to narrow down the search area and to quickly restore the electric power supply.
Historically, the lack of data points provides challenges to the utilities to identify the FL in real time. Utility companies need to rely on customer calls to determine the outage area and to dispatch the repair crew. The whole process of FL identification takes a long time, which lasts from several minutes to a couple of hours [2]. However, the recent advancements in the smart grid technologies can help to minimise the outage times. Modern intelligent electronic devices, FL indicators can be used for FL identification. However, it is not economical to install these devices on every line. Advanced metering infrastructure (AMI), defined as an integrated system of smart meters (SMs), communications networks, and data management systems that enable two-way communication between utilities and customers, can help utilities to identify the outage and its location in quick time and it is more economical.
The widespread popularity of AMI has opened many doors to the use of data analytics and state-of-the-art algorithms in DN. The core of AMI is SMs, which are deployed in large numbers around the globe during the last decade. For example, the number of SMs installed in the UK, the USA, and China reached over 170 million by the end of 2016 [3]. The rising popularity of these SMs combined with their ability to provide an immense amount of electrical data present researchers with opportunities to extract useful information from them. Owing to the availability of highspeed data and advancements in the computational power of data processing devices, many utilities are already in the process of transitioning from traditional FL identification methods to knowledge-based (KB) methods [4,5].

Related work
Traditionally, FL identification methods in distribution systems are classified into impedance-based, travelling-wave-based and KB methods. Impedance-based and travelling-wave-based methods are beneficial for transmission systems, but when considering the increasing complexities in a DN, these methods may not be very effective. Impedance-based FL algorithms are dependent on high values of fault currents at the main substation to calculate the fault resistance, which may lead to multiple estimations of FL [6]. The travelling-wave-based methods correlate the FL with some characteristic frequencies associated with specific travelling-wave paths. These methods are difficult to be applied in DN as they require high sampling frequencies, which limit their practical application [7]. Several efforts proposed these methods in [8][9][10], which are easy to implement, but their accuracy is highly dependent on different fault types, fault resistance, distributed generation, load variations etc.
KB methods use computer programming and a KB approach to solve complex problems. These methods use a large set of training data to train different types of learning algorithms to predict outcomes. Typical examples of KB methods are artificial neural networks (ANNs), support vector machines (SVMs), fuzzy logic, genetic algorithm etc. Among KB methods, ANN is the most common method used in power systems in the areas of event detection [11], fault location identification (FLI) [12], bad data detection, and distribution management [13]. ANN and SVM are combined to locate different types of faults in radial DN utilising measurements at substations, relays, and circuit breakers [14]. The data is analysed using the principal component analysis technique, and the fault-type classification is performed based on the reactance utilising a combination of support vector classifiers and feed-forward (FF) NNs. However, the presence of line capacitance and distributed generation is not investigated.
A combination of ANN, SVM, and decision trees based boosting method for FL is presented in [15]. This method uses voltage and current measurements at certain buses in the feeder to classify the faulted phase and identify the faulted line, but is only tested for single-phase faults. A multiple-hypothesis method is proposed to determine the faulted section on a feeder or a lateral using data from SMs and fault indicators [16]. This method reduces the fault search area by providing a list of credible FLs. Although, the method shows good credibility for multiple outage scenarios such as multiple faults and missing outage reports, but is based on the assumption of a very simple functionality of SMs (providing outage notification only) and is also not tested on a network having DGs.
Feature-based fault classification using fuzzy logic and knearest neighbours are presented in [17,18], but these methods are limited in scope and only present the methodology to identify the fault type. A review of machine learning techniques for FL identification, fault detection, and classification is presented in [19], which lists several machine learning methods and demonstrate their strength levels to solve different sets of problems.

Paper contributions
The authors present the preliminary idea of an FL identification method based on ANN in [12]. The algorithm was tested (offline) in a three-phase, unbalanced DN. Results of the algorithm presented a good accuracy in finding the FL with fault resistance ranging from 0 to 5 Ω. However, these results are provided based on an offline study with no metering infrastructure and communication architecture. Moreover, it was not tested rigorously using a testbed setup. To the best of author's knowledge, there is no work in the literature, which presents a solution to the FL identification problem in a smart grid environment utilising only event-driven measurements from SMs simulated in a digital realtime simulator (DRTS). The main contributions of our work are listed below: This paper is organised as follows. In Section 2, the ANN-based proposed methodology is provided. Section 3 presents the testbed setup. Section 4 covers offline and real-time simulation results and Section 5 discusses sensitivity studies. Section 6 concludes the discussion followed by acknowledgement and references.

Artificial neural networks
ANN is one of the main tools in machine learning that has seen exponential growth in the last decade. Its use has been fuelled by its exceptional performance in finding patterns in data that are far too complex for simple linear systems to recognise. In essence, ANNs learn to identify complex patterns in data by using examples that produce input-output mappings based on a model equation derived from the iterative training process. The training process allows the ANN to learn how to recognise patterns within the data and creates a generalised model equation based on the historical data fed into the network. An ANN is composed by a network of simple processing units, called neurones, connected in different arrangements based on their connection topology. Some of the commonly used topologies currently found in the literature are FF neural networks, recurrent neural networks, auto-encoders, and deep belief networks, among others. The fault classification and the FL identification models proposed in this paper are based on multi-layer FF neural networks, also called multi-layer perceptrons, with sigmoid activation functions (AFs). Three main layers exist in this type of ANN: the input layer, the output layer, and the hidden layer(s). As shown in Fig. 1, the input layer is the place where all the input variables, or features, are fed into the network. The output layer is where the output neurones compute the final result of the network and the hidden layer(s) is where hidden neurones are introduced, by the network designer, to connect the input and output layers. Equation (1) describes the basic computation that takes place in each neurone where w i j is the weight associated with the neurone i and input j, x i j is the input value coming from input j to neurone i, and b i is the bias associated with unit i. The result of this calculation f i is then passed through an AF in charge of producing the output of the neurone as shown below: where φ is the AF and y i is the final output of the neurone. Each neurone in the network performs a computation, where input values are received from other neurones, a summation of all the weighted inputs is performed, and a final result is computed as the output of the neurone by passing the result of the operation through the AF. The learning process that occurs in the proposed method consists of the iterative update of the weights (w i k ) and biases (b i k ) of the network to minimise an error function such as the mean squared error (MSE), given a set of N input-output pairs S = {(x 1 , y 1 ), …, (x N , y N )}. This training process is a supervised iterative process that can be divided into three main steps: forward propagation, calculation of error, and backpropagation [20]. It is important to note that before the training procedure starts, all the weights and biases of the neural network need to be randomly initialised. In forward propagation, the input values or features X = {(x 1 ), …, (x N )} are propagated to the already initialised network, so output values can be calculated as O = {(o 1 ), …, (o N )} based on (1) and (2) in each neurone of the network. The objective of this forward propagation process is to generate estimated values that can be compared with the real values of the example given by Y = {(y 1 ), …, (y N )} using a defined error function. The objective The final step in this process is to perform backpropagation of the calculated error through the network by using gradient descent to adjust the parameters of the neural network (w i k , b i k ) according to where α is defined as the learning rate of the neural network. This training process is iteratively executed for a predefined number of epochs or until the error is minimised. A more detailed explanation of this procedure can be found in [20].

Fault-type classification process
An FF neural network is proposed for identifying the 11 different types of faults that can occur in a DN. These faults are presented in Table 1. Identifying the type of fault that occurs in a DN is an essential operation for further determining the FL in the system.
Both the fault classification and FL identification processes are presented in the flowchart shown in Fig. 2. The first section of the flowchart depicts the procedure used to classify the type of fault that occurs in the DN system using a trained ANN. According to the study presented in [21], SMs connected at the end of the line/ branches of a radial DN give the most relevant information for FL identification applications. Therefore, the proposed algorithm uses event-driven fault-on voltages measurements from only the end of every line/branch, thus minimising the communication requirements. The proposed fault-classification process starts by collecting the fault-on voltages of all the nodes connected to the end of each branch. Then, these measurements are fed into the ANN specifically trained for recognising fault types in the system. The number of measurements, or features, fed into the ANN is determined by the number of buses, where the SMs are located n multiplied by the number of phases in the DN. Using this data, the proposed neural network model is capable of determining the type of fault in the system by recognising the pattern in the bus voltages values caused by the existing fault. Since this is a classification problem, one-hot encoding of all the fault types (categories) is performed to give a numerical value to each category in the output vector of the ANN. The output vector of the ANN has a size of 11, where each index indicates one out of the 11 types of faults that the system can have as shown in Table 1.

FL identification process
The FL identification procedure is a process that is executed based on the results obtained from the fault-classification process, i.e. based on the type of fault. This process uses a set of 11 ANNs trained individually, one for each type of fault, to determine the location of the fault in the system based on the voltages measured from the nodes. The second section of the flowchart in Fig. 2 demonstrates the process for determining the location, where the fault is located in the DN.
Essentially, in this process, fault-on voltages are extracted based on the type of fault detected in the system, and then these voltages are fed to the specific ANN designed to locate the fault in the system based on the type of fault. For example, if a phase-C-toground fault is identified, only the phase C voltage magnitudes of the buses with SMs are used for estimating the FL. The input vector for the C-G fault neural network will be of size 1n, where n is the number of buses in the system with SMs, and 1 represents just one phase (phase C). Similarly, the input vector for the threephase FL NN would be of size 3n, where 3 represents all three phases measured in each SM. All 11 neural networks are designed to produce a vector of size m, which represents the likelihood that the fault is located close to a particular node in the set of all nodes M. For example, using the IEEE 37-bus system, if a three-phase fault occurs at node 713, the output vector would indicate a value of 1 for node 713 and a value of 0 for all other nodes. Similarly, if a three-phase fault occurs in the line connecting nodes 713 and 702, the output vector would show a high value (close to 1) for nodes 713 or 702 depending which node is closer to the fault. The remaining values will be very close to 0.

Hyperparameters for fault-type classification and FLI models
A hyperparameter is a parameter that describes how the training process will be performed and whose value must be set before the learning process begins. In contrast, the values of other parameters are derived via training. Hyperparameters are used to find the right balance between the bias and variance of the model to maintain the accuracy between training, validation, and testing sets. The values of the hyperparameters must be determined by the designer of the machine learning model and are estimated empirically or by using a more complex optimisation function such as grid-search, random search, or evolutionary optimisation, among others. In our particular case, the hyperparameters were determined empirically through a trial-and-error process. Table 2 presents the hyperparameters used to build the proposed ANNs for the IEEE 37-bus system. For fault-type classification, 48

Testbed setup for validation
This section presents an overview of the designed testbed used for testing the FL identification algorithm with real-time data. The proposed testbed includes the following features: • Real-time power system model: It consists of DN (test feeder) model, with detailed PV systems (transient level model), designed to run in a real-time environment. The distribution system is monitored by SMs, and it is assumed that the measurements at substation and PV sites are available. • AMI: consists of SM models, a communication layer used to perform the communication between SMs, data concentrator (DC) and Utility Operations Centre (UOC). • Database and data streaming: SQLite database is set up to store the measurements coming from the distribution system. These measurements are then fed to the ANN-based FL identification algorithm in the UOC to predict the fault type and FL.
The fundamental objective of developing the testbed is to test the proposed algorithm in a real-world-like scenario and to provide confidence to utilities using this method.

Real-time power system model inside DRTS
Power system models running in DRTSs can mimic the behaviour of real-world situations using a very small time step. Owing to the complexity of DG integration to the electrical power system, real-time simulators are useful to evaluate system dynamics [22]. The full real-time experimental setup with communication infrastructure is presented in Fig. 3. In computer 1, the IEEE 37bus distribution system including five PV sources is modelled in RT-LAB and transferred to the real-time simulator [23]. The grid is monitored by simulated SMs placed across the network. Eventdriven measurements coming from the SMs are sent using the IEEE standardised communication protocol DNP3 over transfer control protocol/Internet protocol (TCP/IP). In computer 2, OpenDNP3 is programmed to receive the measurements coming from SMs. The measurements are then temporarily stored in SQLite (database manager). Both, OpenDNP3 and SQLite are part of the DC. After collecting the data in DC, the ANN-based FL identification algorithm running in the UOC (deployed in computer 3) imports real-time data from DC, classify different types of faults and predicts the location for each fault. Fig. 4 shows the hardware components of the described testbed.

Real-time modelling of the test feeder with PV:
The original IEEE 37 test feeder does not have interconnected DGs, and it is a delta-delta system. For our studies, five PV plants with ratings specified in Table 3 are added at buses 820, 829, 818, 831, and 811 and the system is changed from delta-delta to wye-wye to study both line and ground faults. PV specifications are provided in Table 3. PV systems are modelled using transient model with its inverter and control. More details on PV modelling is provided in [24].

State-space nodal (SNN) solver tools:
Two SSN) solver blocks are also used to decouple the large system into smaller statespace matrices. The advantages of using SSN blocks in real time can be found in [25].

Metering locations:
According to a study presented in [21], lowest voltages are found downstream in the feeder. On the basis of our sensitivity studies, it is also observed that the nodes  inputs  48  16  16  16  32  32  32  32  32  32  48  48  outputs  11  37  37  37  37  37  37  37  37  37  37  37  hidden layer, s  1  1  1  1  1  1  1  1  1  1  1  1  hidden neurones  40  80  80  80  80  80  80  80  80  80  80 Table 4.  RMS conversion is performed on the instantaneous voltage measurements coming from the meters, which then return the analogue inputs to the SM block. The SM block can accept a single analogue input or a vector of analogue inputs that will be transferred to the DNP3 slave at the rate defined in the mask. In our case, we are using 18 SM locations, where each SM is transmitting event-driven RMS voltage magnitudes. In Fig. 3 network model, '*' indicates the location of SM model. The term event-driven used in this paper refers to the concept of transmitting and using data when an 'event' occurs in the system.   In the proposed method, the event occurs when during a fault the voltage value goes out of bounds as described by (6). More details on event-driven mechanism is presented next.

SM communication using DNP3 protocol: DNP3 (IEEE 1815) is one of the most common open communication protocols
used in the electric power industry. DNP3 is a protocol that defines the rules for transmitting data from point A to point B using serial and TCP/IP communications. In DNP3, devices are defined as master, for computers located in UOC, and outstations, for remote devices located in the field. An outstation device is in charge of collecting information from sensors and sending them back to a master device. A master device can be connected to multiple outstations and is in charge of collecting information for further transmission to other master devices or services as well as controlling operations made by issuing commands such as sending analogue output values or closing circuit breakers. One major advantage of the DNP3 protocol is that it is an event-oriented protocol, in which masters do not have to be constantly scanning outstations for every single data point. DNP3 operates in one of two main operating modes: event polling and unsolicited responses. In event polling, the master polls the outstations for any changes in their data on a regular scan period. If nothing has changed, the response will contain no measurement data; otherwise, the master will receive the new data and confirm it to avoid repeated data in subsequent responses. Similarly, the unsolicited response operating mode available in DNP3 offers the ability to transmit data only after a change in a bounded value occurs and without having a specific request been sent from the master to the outstation device. This ability reduces the bandwidth use of the system considerably [26].

Communication between SM and DC:
Owing to the configurable nature of the communication infrastructure and the existence of the unsolicited responses mode, DNP3 is a good protocol candidate to be used in SMs for fault-type classification and FL application. SMs in the DN are deployed as outstations that communicate with a master device using the DNP3 protocol. The SMs are using the unsolicited response operating mode and only transmits data to a master when the voltage values are out of the bounds using the threshold 0.95 pu ≤ V ≥ 1.05 pu The master device is modelled using a modified version of the open-source code implementation called OpenDNP3. The original implementation code from OpenDNP3 was modified to have the following features: i. Communicate with multiple outstations (SMs) in the DN inside the DRTS. ii. System operating in unsolicited response mode. iii. Store the received point data into SQLite database.
The flowchart presented in Fig. 6 shows the DNP3 communication process for the SMs. For DNP3, the master device connects to the SQLite server and to all the outstations assigned in the network group with the objective of listening to all messages coming from these devices. Concurrently, each DNP3 outstation device is measuring its assigned values and checking if any of these values have triggered an event. If the event is triggered, the DNP3 outstation, in our case the SM, will send the measured values (voltages) to the master while expecting an acknowledge packet (ACK) from the TCP handshake to determine if the packet needs to be re-sent or not. When the master receives the measured values, it will send the ACK packet and also initiate the sending procedure of the CONFIRM DNP3 function. The master sends the CONFIRM function packet, and the outstation receives it and acknowledges the received packet by emitting an ACK to complete the TCP handshake. If the CONFIRM function code is correctly confirmed, the master will update the SQLite database with the new measured values received from the outstation. To confirm the communication between DNP3 outstation and DNP3 master, a third party tool named Communication Protocol Test Harness is also used. Test Harness is a powerful tool for testing DNP3, IEC 60870-5, and Modbus devices.
The OpenDNP3 implementation is deployed as a master device on a computer running Ubuntu Linux with a 2.4 GHz Intel Core i3 and 8 GB of RAM. This computer is connected to the DRTS through a router and a switch using TCP/IP. Fig. 4 shows the connection between OpenDNP3 and the outstations (SMs) simulated inside the DRTS.

Data concentrator
A DC is configured to store the measurements, which can be accessed by the UOC running the FL algorithm as depicted in Fig. 3. A modified version of OpenDNP3 is used as the main engine in charge of performing the SM data collection. A SQLite database is also created to store measurements acquired by the OpenDNP3 agent. Fig. 4 shows the hardware setup for DC and UOC.
SQLite is used as the selected database due to its flexibility, portability, and accessibility. SQLite is a relational database management system that has the convenience of being a selfcontained file-based database that is extremely portable from machine to machine without any required installation or other constraints found in different process-based server relational databases such as MySQL. This portability together with its excellent standard compatibility makes SQLite an excellent candidate for development and testing phases of any project that requires the use of databases in a dynamic environment (e.g. multiple controllers running different operating systems, applications etc.). It is important to remark that any other database

Data collection and training
To validate the proposed method, training data is generated using OpenDSS and the MATLAB com interface. A modified version of the IEEE 37-bus system with five PV sources as shown in Fig. 3 is modelled. As mentioned earlier, it is assumed that the fault-on voltage measurements are available from the substation and five PV sites, and SMs are placed at the end of every line/branch of the DN. This results in a total of 18 measurement points including the substation (one sample). For training purposes, all 11 types of faults are placed at all 37 nodes, one at a time with a combination of varying fault resistances (0:0.5:10 Ω) and load variations (0.5:0.25:1.5 pu) which results in 228,327 samples. The proposed methodology is a two-step process, in which the first step is fault-type classification and the second step is FL identification. In the first step, the ANN-based classifier contains 11 different classifiers, which represent 11 types of faults. In the second step, based on the fault type, each classifier is trained individually to obtain the FL. OpenDSS-MATLAB com interface is the primary tool used for training the ANN and performs offline validation.

Offline validation -fault-type classification
For fault-type classification, the data set obtained from OpenDSS is split into 70% training, 15% validation, and 15% testing. The ANN-based classifier predicts the output for all 11 types of faults with 100% accuracy with fault resistance 0-5 Ω, but the accuracy decreases to 99.5% when the fault resistance increases from 5 to 10 Ω for two line-to-line (2LL) and two line-to-ground (2LG) faults as shown in the confusion matrices presented in Fig. 7. The confusion matrix here shows the result obtained on the test data. For example, in row 4 of Fig. 7b, for AB-G faults, 99.6% of the faults are classified accurately by the ANN as AB-G faults and the 0.4% are classified as AB faults.

Offline validation -FL identification
After the fault-type classification, each classifier is trained individually to identify the FL. For FL identification, all 11 types of faults are placed on all the nodes with a combination of varying fault resistances (0:0.5:10 Ω) and load variations (0.5:0.25:1.5 pu). Table 5 shows the accuracy obtained on test data of 11 NNs used to identify the FL. Here also, the data set is split into 70% training, 15% validation, and 15% testing for each type of fault. The classifier indicates 100% accuracy when the fault resistance is 0-5 Ω and above 98% accuracy when the fault resistance is 5-10 Ω.
In a DN, faults are not only placed on the nodes, but can occur anywhere along the line. To verify the accuracy of the proposed algorithm for faults along the lines, all types of faults are placed randomly on different lines with a random resistance value in the range of 0-10 Ω, and the data is collected only from the specified metering locations. Table 6 shows the FL identification accuracy for faults along the lines.
The faults are created at different lengths along the lines; therefore, the faulted node column shows two nodes connected to the faulted line. It is important to mention that the output vector (faulted node) would indicate a value in the range of 0-1 for the faulted node. For example, if a fault occurs close to the node 702, the output vector would show a high value (close to 1) for node 702. Conversely, if a fault occurs close to the node 703, the output vector would indicate a value (close to 1) for node 703. Similarly, if the fault occurs in the middle of the line 702-703, the output vector would indicate a value (close to 0.5) for nodes 702 and 703.

Real-time validation
To validate the performance of the proposed methodology in real time, test cases are obtained by creating faults in the real-time model running on the DRTS using the testbed described in Section 3. It is worthwhile to mention that, in this case, an electromagnetic transient model of the IEEE 37-bus system including five PVs and their control is implemented in an OPAL-RT DRTS. A simulation time step of 50 μs is used for this purpose. The purpose of the realtime simulation is to validate the results obtained offline, so that it provides confidence to the utilities to use the proposed algorithm.  varied from 0 to 10 Ω with 0.5 Ω interval and loading condition also varies from 0.5 to 0.5 to 1.5 pu. This results in 2331 samples from the metering points, which are then sent to the UOC using the process described in Section 3. The ANN classifier (running in the UOC) predicts every fault type with 100% accuracy other than 2L-G and L-L faults. The accuracy for 2L-G and L-L faults is also 100% when the fault resistance ranges from 0 to 5 Ω, but decrease to 99.5% when the fault resistance is in the range of 5-10 Ω as shown in Table 7. One of the reasons for this decrease is that at high fault resistance values, the voltage drops for both L-L faults and 2L-G is relatively the same, so the classifier finds it difficult to distinguish between the two. This behaviour of the classifier also validates the results obtained during the offline study.

Case II: FL identification:
After fault classification, the ANN for the specific fault type is used to perform the FL identification procedure. To locate the fault, test data is obtained by simulating all 11 types of faults, one at a time, on random nodes of the test DN with varying fault resistance (0-10 Ω). The ANN predicts each FL with fault resistances <5 Ω with almost 100% accuracy and above 98% accuracy when the fault resistance is 5-10 Ω as shown in Table 8.
It was observed that when the fault resistance is in the range of 5-10 Ω, the adjacent nodes to the FL were predicted by the ANN. For example, when a fault is placed on node 708 and node 733 with a fault resistance of 10 Ω, the classifier predicts the adjacent node as the faulted node. Table 9 shows results for some of the faults used for testing.
To assess the performance of the proposed method for locating the faults along the lines, all fault types with varying fault resistances (0-10 Ω) are simulated along some lines as shown in Table 10. In this case, the effect of DG for FL identification is also considered. Since DG has a significant contribution toward fault current in case of faults and our method is based on voltage profile; hence, it is observed that the proposed method is not affected by the DG operation as shown in Table 10. The faults are created at fixed lengths, therefore, the faulted node column shows only one node that is nearest to the fault (in comparison with Table 6), except for the results where the faults occurs at the accurate 50% location of the faulted line. In that case, the output vector would indicate the two neighbouring nodes as the faulted nodes.

Effect of fault resistance
From offline and real-time results, it can be observed that the proposed algorithm for FLI performs well when the fault resistance   is in the range of 0-5 Ω, but the accuracy decreases when the resistance increases from 5 to 10 Ω. One of the reasons is that there is a significant voltage drop with low-impedance faults, but when the fault impedance increases, the voltage drop is not very significant. The effect of fault resistance on predicting the FL is presented in Fig. 8. All types of faults are placed on all 37 nodes, one at a time with fault resistances ranging from 0 to 10 Ω with 0.5 Ω interval. The loading conditions are kept constant at 100%, which results in a total of 21 cases, in which each case has five different types of faults and 37 different FLs. Each fault is placed for five cycles and cleared for the next five cycles. This results in a total of 6 s simulation time. Fig. 8 shows the accuracy of the proposed algorithm. Here, 100% accuracy means that the algorithm predicted all the 37 nodes accurately and 97.3% accuracy means that 36 out of 37 nodes are predicted accurately. It can be seen from Fig. 8 that accuracy decreases when fault resistance increases from 5 to 10 Ω.

Impact of loading conditions
Owing to uncertainties in a DN, exact values of loads are difficult to estimate at the time of the fault. To test the impact of the loading conditions on the performance of the proposed method, all types of faults are placed on all 37 nodes, one at a time with load changing from 0.5 to 0.25 to 1.5 pu and fault resistances varying from 0 to 5 to 10 Ω. This results in a total of five cases, in which each case has five different types of faults and 37 different FLs. Each fault is placed for five cycles and cleared for the next five cycles. Again, this results in a total of 6 s simulation time. Fig. 9 shows that the accuracy of the proposed algorithm for an A-G fault in the DN.
Here, 100% accuracy means that the algorithm predicted all the 37 nodes accurately and 97.3% accuracy means that 36 out of 37 nodes are predicted accurately. Fig. 9 shows the relationship between the accuracy 'Z' of the proposed method, loading conditions 'Y', and fault resistance 'X'.

Impact of measurement error
Errors in measurement may impact the quality of the proposed method. Therefore, it is necessary to analyse the sensitivity of the proposed method as it relates to problems such as the influence of noise. To examine the performance of the proposed algorithm with measurement errors and noise, inputs to the ANN obtained from previous case studies are multiplied by zero-mean Gaussian noise with 0.1, 0.5, and 1% standard deviations (SDs).
When the measurement data is within 0.1% SD, all the predicted locations are accurate. However, as the SD increased to 0.5 and 1%, some test cases identified the fault in the adjacent  nodes, as shown in Fig. 10. Although the accuracy of the proposed method was affected by the measurement errors, most of the results can be considered satisfactory.

Conclusion
In this paper, a real-time FL identification method and a testbed are presented utilising event-driven data from SMs. An AMI is developed using DNP3 and TCP/IP for communication between SMs, DC, and UOC. To validate the proposed method in real time, a practical distribution feeder with five PV sources is modelled. Event-driven fault-on voltages from SMs located inside DRTS are sent to the DC. The data is then fed into an ANN-based FL identification algorithm in the UOC. The algorithm classifies and determines different types of faults in the system with high accuracy. Different types of sensitivity studies are conducted to analyse the effect of changing parameters on the algorithm output.
The proposed FL identification algorithm shows satisfactory performance for both fault-type classification and FL identification.