Smart substation network quality monitoring and fault prediction

The smart substation communication network is the basis for information sharing of various devices in the substation. Its operation status has an important impact on the safe operation of the substation and even the power grid. Therefore, real-time status monitoring of the smart substation communication network is becoming more and more important. Aiming at the problems of single dimension, insufficient real-time performance and manual fault analysis in existing substation communication network state monitoring technology, this paper proposes a method of smart substation communication network state monitoring and fault prediction based on network communication quality. This paper uses switch ACL technology and coloring technology to obtain communication quality indexes such as bandwidth utilization, delay, and packet loss rate in real time; based on a multi-dimensional evaluation algorithm, a comprehensive evaluation model of network communication quality is constructed; the model of the relationship between abnormal network communication quality and failures is established. Finally, real-time monitoring of network communication quality and fault prediction are realized. The application analysis in a typical 110 kV substation shows that this method can effectively evaluate the network communication quality and accurately predict failures, and can guide operationer and maintenaner to quickly restore the normal operation of the communication network.

, the bandwidth utilization rate can be calculated approximately by dividing the traffic statistics of the virtual link and the physical link by the time slice.
This article uses the packet coloring technology to realize the information statistics of characteristic packets. The specific process is as follows: the switch at the packet sending end adds a flag to the packet for coloring, and the switch at the packet receiving end removes the flag to restore the packet, that is, decoloring. The transmitter and receiver switches respectively count the number of colored packets in the same measurement interval, and can obtain the number of packets lost and the packet loss rate. At the same time, each switch measures the entry time and exit time of the colored packet one by one, and writes the accumulated delay information into the reserved fields of the GOOSE, SV and MMS packets. The receiver switch calculates the delay value of the packet according to the reserved fields . When the network communication is normal, in order not to affect packet forwarding function of the switch, the packets are periodically colored to collect communication quality indexes. But when the network fails, the measurement period can be reduced to obtain more communication quality index values for locating the fault. Fig.2 shows the process of coloring reserved fields in GOOSE, SV packets.  2 Process of coloring reserved fields in GOOSE, SV packets. So far, this article uses the ACL technology and coloring technology of the switch to obtain key indexes such as the bandwidth utilization, delay, and packet loss rate of the virtual link in real time, and realizes the real-time monitoring of the communication quality of the physical link and the virtual link.

Establishment of network communication quality evaluation model
According to the indexes selected in the previous chapter, each index reflects the performance of the network from different aspects. Due to the complexity of the smart substation business, when a single index is abnormal, it is impossible to accurately determine whether there is a fault [13] and the cause and scope of the fault. In order to scientifically and comprehensively evaluate the communication quality of the smart substation network, this paper proposes an evaluation model for the communication quality of the smart substation network. The three indexes of delay, packet loss rate and bandwidth utilization are calculated through the evaluation model to obtain the comprehensive evaluation value of network communication quality. Because the units of the indexes of delay, packet loss rate and bandwidth utilization are different, they are of different dimensions. The data values vary greatly and cannot be calculated directly without processing. Therefore, referring to the non-dimensional calculation method of the unit value of the power primary system, the measurement value of the evaluation index is processed in a non-dimensional manner. The size of the measured value is standardized in the interval of [0, 1], and the measured value of each index is dimensionless to form a parameter matrix. Then assign weights to each index according to the characteristics of the smart substation network. Finally, the comprehensive evaluation value of the smart substation network is calculated after weighted average.

Dimensionless communication indicators
First of all, combined with a large number of project site measured values and the requirements of standards and specifications, the evaluation standard of three indexes are formulated. The evaluation standard of bandwidth utilization are shown in Table.1 below. The evaluation standard for packet transmission delay are shown in Table.2 below. The evaluation standard of the packet loss rate, the score is 1 when there is no packet loss, and the score is 0 when there is packet loss.

Communication index empowerment
In this paper, the weighting method based on the principle of "function-driven" assigns weights to the three indexes of delay, packet loss rate and bandwidth utilization. The weight coefficient is set according to the importance of the indicator's influence on the network communication quality. The importance of each index is graded from 1 to 9, with 9 being the most important, decreasing step by step, and 1 being the least important.
Since the protection device in the smart substation network has strict requirements on the transmission delay of SV messages, and the delay fluctuation is very obvious when the network communication fails [13] , the delay importance level is set to the highest. Because the smart substation network has high redundancy and good reliability, the possibility of packet loss is very small. Once a packet is lost, it can be quickly found through Network Analyzer. Therefore, the importance level of packet loss is set to the lowest. According to the importance of the three indexes of delay, bandwidth, and packet loss, the importance levels of the three indexes of delay, bandwidth utilization, and packet loss rate are divided into 8, 6 and 2. According to the formula (1) ,the weight coefficients of the three indexes of delay, bandwidth utilization, and packet loss rate are calculated to be 0.5, 0.375 and 0.125 respectively.
Suppose a system is composed of m "modules", which are F 1 , F 2 ...F m , where F i (i=1,2...m) is the "important" of the i-th "module" ", then the F i weight coefficient w i can be defined as formula (1): In the comprehensive evaluation of network message transmission delay indicators, the delay weight values of different types of messages are not the same. According to the priority of packet, the importance level is assigned. The shifted GOOSE message level is 9, the SV message level is 7, the In the comprehensive evaluation of the network bandwidth utilization index, the link between the switch and the switch is the communication trunk line, and the link between the IED device and the switch is the communication branch line. Since the influence of the disconnection of the communication trunk line is much larger than that of the communication branch line, the importance level of the communication trunk line is divided into 9, and the importance level of the communication branch line is 6. The weight values are 0.6 and 0.4 respectively.

Communication quality evaluation calculation
Suppose the measurement index has k measurement points (virtual links), and each measurement point is t time, totaling k*t samples, and the corresponding matrix is represented by V, which is shown in matrix (2).
(2) The measured value of each index is processed in a dimensionless manner according to the index evaluation standard, and the relative value of each index is obtained. Then this paper uses the weighted average algorithm to calculate the comprehensive evaluation value of the communication quality at a certain measurement point at a certain time. The calculation method is shown in formula (3).
(3) In formula (3), P j is the communication quality evaluation value of the j-th measurement point, m is the total number of evaluation index, n is the number of positive index in the evaluation index, and m-n is the number of inverse index in the evaluation index. w i is the weight of the i-th index, v ij is the normalized relative value of the i-th positive index at the j-th measuring point, and v ij ' is the normalized relative value of the i-th inverse index at the j-th measuring point. A positive index is an index that is better evaluated with a larger value, such as available bandwidth, and an inverse index is an index that is better evaluated with a smaller value, such as delay and packet loss rate.
(4) v ij is the measurement index value of the i-th item of the j-th measuring point. min is the minimum value of the measurement index value of the i-th item of the j-th measuring point in t samples. max is the maximum value of the measurement index value of the i-th item of the j-th measurement point in t samples.
The comprehensive evaluation value of network communication quality at a certain moment is obtained by comprehensively weighting the comprehensive evaluation values of all measurement points in the network. The calculation formula is shown in equation (5).

Threshold setting adjustment
The fault prediction module first detects the communication quality parameters of the entire station, which is divided into passive detection and active detection. Passive detection refers to the periodic detection of the communication quality of all virtual links and physical links of the entire station. Active detection refers to the active detection of the communication quality of one or several specific virtual links and physical links when the network fails or according to engineering needs. The measured value of the detected indexes are calculated by the communication quality evaluation model.The calculated evaluation results are merged, and then filtered and analyzed. According to the merge result, judge whether there is an abnormal event. According to the correlation between abnormal communication quality indexes and communication failures, predict the cause of the failure. Finally, a fault alarm is generated.According to the accuracy of the fault alarm, the maintenaner can adjust the fault prediction threshold. After the threshold is adjusted, maintenaner need verify the overall status of the network to ensure the correctness of failure prediction. The general standard process of threshold setting is shown in Fig.3

Judgement of the cause of failure
Before the failure of the smart substation network (such as equipment damage, cable degradation, broadcast storm), the network communication quality will be abnormal. Since communication quality index abnormalities have a certain correlation with communication failures, the cause and scope of the failures can be predicted through the communication quality index abnormalities. The article [11][12] published earlier by myself and my collaborators also proposed a fault diagnosis model. The focus of the diagnosis is physical topology and virtual link connection faults, and it is impossible to diagnose complex communication network faults. This article will further study the relationship between abnormal communication quality indexes and common network communication failures.
Common smart substation communication failures include physical link failures (line disconnection, line aging), switch equipment failures (crash, power-off restart, high working temperature, etc.), network storm (ring network storm), network congestion (SYN Flood storm, ARP request storm) and so on.
When the comprehensive evaluation value of network communication quality is abnormal, check the packet loss rate, bandwidth utilization rate, and delay indexes of each physical link and virtual link. Define the abbreviation of packet loss rate as loss, bandwidth utilization abbreviation as band, delay abbreviation as delay, device abbreviation as IED, switch abbreviation as SW, single physical link abbreviated as SL, multiple physical links abbreviated as ML, virtual link abbreviated as VL.According to the following steps to predict and diagnose the fault, the specific process is shown in Fig.4 Fig.4 Process of failure prediction (1) If the packet loss rate of a single physical link is 100%, the cause of the failure is judged to be a disconnection of the physical link (cable disconnection or interface disconnection)； (2) If the packet loss rate of a single physical link is not 100% and the delay change is small, the cause of the fault is judged to be the aging of the line or the loose interface； (3) If the packet loss rate of multiple physical links is 100%, and these physical links are connected to the same device, the cause of the failure is judged to be an abnormal device (halt or communication interruption)； (4) If the packet loss rate of multiple physical links is not 100% and the delay change is small, and these physical links are connected to the same device, the cause of the failure is judged to be an abnormal device (restart the device or packet loss due to excessive temperature)； (5) If the bandwidth utilization of the physical link gradually increases, and the bandwidth utilization and delay of each virtual link do not increase significantly, the cause of the fault is judged to be network congestion (non-business packets)； (6) If the bandwidth utilization of the physical link gradually increases, and the bandwidth utilization and delay of each virtual link also increase significantly, the cause of the fault is judged to be ring network storm； (7) If the bandwidth utilization of the physical link gradually increases, and the bandwidth utilization and delay of the goose virtual link increase significantly, and the bandwidth utilization and delay of the sv virtual link do not increase significantly, the cause of the phenomenon is the burst of GOOSE packets. It is not a fault .
The above process analyzes the relationship between common network faults in smart substation networks and abnormal communication parameters. As for unknown or unusual failures, abnormal values of communication quality indexes can be collected when the failure occurs. By these abnormal values, the communication quality change characteristics of the fault can be geted. In this way, the scope of the failure prediction model is continuously expanded.

Case verification
The research results of this paper have been successfully used in the pilot project of the 110kV substation in Guanyinqiao, Sichuan. This paper uses the data collected during the trial operation to verify the effectiveness of the method. The topological structure of the process layer network of the station is shown in Fig.5. The process layer network adopts star connection and consists of five subnets, including two subnets of 110KV line, two subnets of main transformer and a subnet of bus. In Figure 5, the twenty one ports connected to the IED and the switch are numbered in sequence, and the IED devices connected to the port numbers are shown in Table.3.    Table.5. Through comparison, it is found that the network communication quality index drops significantly when a fault occurs. According to the method of fault prediction in this paper, because the bandwidth and delay of each virtual link are increasing, the fault is judged as a ring network storm fault. According to the results of the failure prediction, the operation and maintenance personnel found that the line 1 # switch and the main transformer 1 # switch were interconnected, forming a ring network. This case proves the accuracy of failure prediction. It can be seen, the method of smart substation network communication quality monitoring and fault prediction in this article can scientifically evaluate the network communication quality and accurately predict the cause of the fault. This method can help operationer or maintenaner to quickly find the fault location, quickly restore the normal operation of the process layer network, and improve the operational reliability of the secondary system of the smart substation.

Conclusion
This paper studies the method of smart substation network communication quality monitoring and fault prediction. This method can obtain the communication parameters of the virtual link and physical link of the packete in real time through the switch. This method constructs a network communication quality evaluation model. This method can predict the cause and scope of the failure based on the abnormality of the network communication quality evaluation index. The case results show that the method can effectively improve the operation and maintenance efficiency of smart substations, promptly warn of abnormal conditions, and improve the reliability of communication network .
The method proposed in this paper can provide a new perspective for smart substation network monitoring and fault diagnosis, and has a wide range of application prospects in engineering applications. In future research, we will further monitor more communication quality indexes and expand fault models to improve the comprehensiveness and accuracy of smart substation network communication quality detection and fault prediction technologies.