Partial Discharge Diagnosis with Siamese Fusion Network

Partial discharge is a common fault type in the operation of power equipment. Recently, deep learning methods have shown great potential in partial discharge (PD) diagnosis. These methods construct a fitting relationship between input and output with mass training samples. Due to the scarcity of PD samples, it is hard to train a classification model. Thus, it is challenging to apply traditional deep learning methods for diagnosing PD. To overcome this issue, this paper introduces a siamese fusion network to diagnose PD. This method comprises three main steps. First, an ultra-high-frequency (UHF) sensor generates two spectrums from power equipment, including phase resolved partial discharge (PRPD) and phase resolved pulse sequence (PRPS). Based on a few-shot learning strategy, a support set is constructed. There are four different types of PD samples and a normal sample. Then, two siamese networks are employed to estimate similarity scores between a test sample and support set samples. One network measures similarity scores in PRPS, and the other measures similarity scores in PRPD. Based on similarity scores, initial diagnosis results are generated. Last, a simple and effective decision fusion technique fuses initial diagnosis results. The final diagnosis result can be generated by jointly exploiting complementary information in two spectrums. With limited training samples, experimental results show that the proposed SFN method can achieve an outstanding diagnosis performance, compared with several classical classification methods.


I. INTRODUCTION
W HEN power equipment operates for a long time, various faults are inevitably produced. Among these faults, partial discharge (PD) is a serious fault type [1]. If this fault is not detected timely, the power equipment will deteriorate. Ultimately, PD will develop into a discharge breakdown or spark discharge, leading to tremendous economic losses [2], [3].
To identify PD, some methods have been designed, such as conventional pulse current, high-frequency current method, ultra-high-frequency (UHF) method, acoustic emission (AE) method, and dissolved gas analysis (DGA) method [4]. Among these methods, conventional pulse current and DGA methods fail to provide locations for PD sources, and the acoustic method is not suitable to diagnose faults inside power equipment. The UHF method is widely used for analyzing PD signals [5], which can diagnose the PD fault type, location, and risk degree. After operators obtain UHF spectrums, they need to analyze each sample manually and decide on PD types, according to the characteristics of PRPS and PRPD. However, with the rapid development of power systems, the corresponding operation tasks are increasing. The manual work has some disadvantages, such as low efficiency and low detection rate. It is hard to meet the needs of an intelligent power grid.
To overcome this problem, some deep learning methods have been proposed. These methods [6]- [20] construct a fitting relationship between input and output by using lots of training samples. Then, they can distinguish different PD fault types. Li et al propose a convolution neural net-work (CNN) [7] with a deep architecture to realize ultrahigh frequency (UHF) spectrums recognition in gas-insulated switchgear (GIS). The work [12] introduces a CNN-based PD classification system using transfer learning, which can reduce the noise in PD samples. Wan et al [13] use long shortterm memory (LSTM) and recurrent neural network (RNN) to diagnose PD in GIS. A stacked denoising auto-encoder method is proposed for PD diagnosis in different voltage cables of insulators [15]. Moreover, some traditional machine learning methods are proposed for PD diagnosis. For example, a multi-kernel multi-class relevance vector machine [16] is designed for PD diagnosis. Yang et al [17] propose a lowrank radial basis function network to diagnose PD signals, by suppressing narrow-band noise. Qu et al [18] combine discrete wavelet transform and LSTM techniques to diagnose PD faults. A fuzzy theory [19] has a simple model structure, achieving a fast diagnosis performance. The classical support vector machine (SVM) [20] also is applied in PD diagnosis.
It should be noted that previous works need a great number of PD samples to build classification models. However, PD samples are usually very scarce, and it is hard to collect sufficient samples. Thus, with limited training samples, these methods fail to generate high PD diagnosis accuracy effectively. The issue of how to use limited training samples for PD fault diagnosis is still an open question. Recently, the siamese network has been proved an efficient tool in image classification with limited training samples [21]. This network estimates the similarity between two input data via using a unique structure [21]- [28]. Zhang et al introduce a dual-path siamese CNN method for image classification [22]. Gao et al propose a siamese training structure for data augmentation, improving classification performances [23]. A deep siamese framework with multitask learning is proposed for classification [24]. He et al develop a siamese residual network with 3-D filters for image classification [27]. Liu et al investigate the siamese network to train a linear classifier for image classification [28]. These methods have an outstanding performance with limited training samples.
Inspired by the siamese network, the motivation of this work is to diagnose PD with limited training numbers. This paper introduces a siamese fusion network (SFN). It consists of three main steps. First, an ultra-high-frequency (UHF) sensor produces two spectrums from power equipment, including phase resolved partial discharge (PRPD) and phase resolved pulse sequence (PRPS). Based on a few-shot learning strategy, a support set is constructed. There are four different types of PD samples and a normal sample. Then, two siamese networks are employed to estimate similarity scores between a test sample and support set samples. One network measures similarity scores in PRPS, and the other measures similarity scores in PRPD. Based on similarity scores, initial diagnosis results are generated. Last, a simple and effective decision fusion method fuses diagnosis results. By jointly capturing complementary information in different spectrums, the final diagnosis result can be generated. The main contribution of this work is as follows: (1) With a new strategic perspective, PD diagnosis has been modeled as a few-shot learning based classification problem. With limited training samples, a siamese fusion network identifies PD fault types. To our knowledge, the siamese network is applied in PD diagnosis for the first time.
(2) The proposed method fuses complementary information of different spectrums. It is demonstrated that the fusion of complementary features can result in an outstanding improvement in diagnosis performance.
The rest of this work is organized as follows. Several partial discharge types and the siamese network model are reviewed in section II. The proposed SFN method is introduced in Section III. Section IV presents the experimental results and analysis. Conclusions are given in Section V.

A. PARTIAL DISCHARGE TYPES
Ultra-high frequency (UHF) sensors can obtain electromagnetic wave spectrums with a frequency ranging from 300MHz to 3GHz. The UHF sensor can sense and record the characteristic information of the UHF partial discharge signal, including amplitude, frequency, phase, etc. PRPS and PRPD spectrums are generated based on the above feature information. The UHF sensor converts UHF signals into electrical signals. Then, electrical signals are converted into the digital signal through the UHF signal acquisition and processing unit. The digital signal can be stored in the computer. Fig. 1 shows the UHF partial discharge detection system [29].
Each PD type usually has distinctive characteristics in PRPS and PRPD spectrums [30]. Specifically, PRPD can reflect the relationship between multiple cycles, discharge spectrum amplitude, and discharge frequency [31]. This spectrum is generated by using a wavelet transform or the Hilbert yellow transform [32]. It can be served as a classification basis for various discharge types. In addition, PRPS is a 3-dimensional distribution of discharge amplitudes, regarding phase and power frequency cycle. This spectrum can capture local discharge information such as phase distribution. There are corona discharge, suspended potential discharge, free metal particle discharge, and insulation gap discharge. The spectrums of the free metal particle discharge. Fig. 2 displays the spectrums of the free metal particle discharge. In Fig. 2(a), the polarity is not obvious. In Fig.  2(b), the power distribution range is distributed, and its discharge interval is unstable. The spectrums of the suspended potential discharge are shown in Fig. 3. The polarity is more obvious, and its pulse is relatively stable. Besides, its discharge interval presents the cycle distribution. Fig. 4 illustrates the spectrums of the insulation gap discharge. It doesn't have an obvious polar effect, and the pulse amplitude is more dispersed. The spectrums of the corona discharge are presented in Fig. 5. Its polarity is obvious, and the pulse amplitude is dispersed.

B. SIAMESE NETWORK
The siamese network is a unique network architecture. Traditional deep learning models aim to classify input images, and the siamese network focus on exploiting the similarity between input images. Specifically, a siamese network usually contains two same sister networks. The last layers of the sister networks are fed to a contrastive loss function, estimating the similarity between two input images. The architecture of the siamese network is shown in Fig. 6. Each image is fed to one of two networks.
According to the reference work [21], the siamese network is optimized by using a contrastive loss function. This loss function can well express the matching degree of paired samples. Besides, it can be used to train the model of feature extraction. The loss function [21] is as follows: where D w is the Euclidean distance between the outputs of the siamese networks. When Y = 1 (i.e. the samples are similar), 1 2 [max(0, m − D w )] 2 remains in the loss function. If the Euclidean distance in the feature space is large, it indicates that the current model is not good, and the loss is increased. When Y = 0 (i.e. the samples are not similar), the loss function is 1 2 (D w ) 2 . The Euclidean distance of the feature space is small, and the loss value will become larger.
The Euclidean distance is represented as follows: where X 1 and X 2 are two input data. G w (X 1 ) means the output deep feature of X 1 , and G w (X 2 ) means the output deep feature of X 2 .  Fig. 7 shows the flowchart of the proposed SFN method.

III. THE PROPOSED SFN METHOD
The SFN method consists of three main steps. First, based on a few-shot learning strategy, a support set is constructed, containing some labeled samples, such as four PD types and a normal type. Each sample has PRPS and PRPD spectrums, obtained by the UHF sensor. Then, two siamese networks are employed to estimate the similarity between a test sample and support set samples. One network measures the similarity in PRPS, and the other measures similarity in PRPD. According to two similarity scores, initial diagnosis results can be generated. Last, a simple and effective decision fusion method is designed to fuse initial diagnosis results. By jointly exploiting complementary information, the final diagnosis result is generated.

A. SUPPORT SET GENERATION
The UHF sensor is employed to obtain PRPD and PRPS spectrums. A test sample X t contains the PRPD spectrum X P RP D t and the PRPS spectrum X P RP S t : A support set is constructed, containing a few labeled samples. It includes corona discharge, suspended potential discharge, free metal particle discharge and insulation gap discharge, and normal samples. Each type has one image. The images in the support set X s (i) also contain the corresponding PRPD spectrum X P RP D s (i) and the corresponding PRPS spectrum X P RP S s (i): where i =1,2,3,4,5. X s (1) represents the corona discharge sample, X s (2) means the suspended potential discharge sample. X s (3) serves as the free metal particle discharge sample. X s (4) is the insulation gap discharge sample, and X s (5) means the normal sample.

B. SIAMESE NETWORK-BASED SIMILARITY ESTIMATION
In this step, two siamese networks are employed to estimate the similarity between the test sample and support set sam- ples. One siamese network measures the similarity of the PRPD spectrum, and the other estimates the similarity of the PRPS spectrum. In Fig. 8, each siamese network has two VGG-16 models with the same network structure and shared parameters. The VGG-16 model has multiple convolution layers, a sigmoid activation function, a mean-pooling layer, and a fully connected layer. The test sample and support samples are served as two inputs images, capturing the deep feature information. The input spectrums are subjected to the VGG-16 model, and the output deep information can be obtained: where G(X t ) represents a step that uses the VGG-16 model to extract the deep feature information from the test sample X t . O t is the output deep information of X t , including two elements: Similarly, G(X s (i)) means a step that adopts the VGG-16 model to capture the deep feature information from support set samples X s (i). It is represented as follows: where O s (i) means the output deep information of X s (i), and it contains two elements: This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ advantage of generalization ability, the VGG-16 model is adopted to exploit the deep information.
Eq. (2) measures the PRPD similarity score: Similarly, the PRPS similarity score is defined as follows: The value of the output results is normalized to [0, 1].

C. DECISION FUSION BASED PD DIAGNOSIS
Since the UHF signal is complex and various, a single spectrum fails to reflect the UHF signal fully. The diagnosis accuracy with a single spectrum fails to obtain outstanding performances effectively. The PRPD and PRPS spectrums can reflect the UHF signal from different views since the two spectrums are from the same test sample. There is strong complementary information in the PRPD and PRPS spectrums. To exploit the complementary information in different signals, a simple and effective decision fusion technique is adopted to utilize two similarity scores: E(i) is the sum of two similarity scores. E P RP D (i) is the PRPD similarity score between the test sample and support set samples. E P RP S (i) is the PRPS similarity score between the test sample and support set samples. By jointly utilizing complementary information, the proposed method can generate more accurate diagnosis performance. Specifically, the lowest scores can be generated as follows: where i = 1,2,3,4,5. Sof tmin is the lowest score in results. The output result is the corresponding PD category in support set samples.

A. EXPERIMENTAL SETUP
A challenging UHF data set is collected to evaluate the performance of the SFN method. A UHF sensor is used to generate the experimental data set. It is produced by the document management system (DMS) company in the UK. This sensor detects the partial discharge of GIS equipment in 220kV substations, which are in Hunan Province, China. The data set contains 320 images, provided by the State Grid Corporation of China. Some PD samples are shown in Fig. 9. The details are shown in Table I. In this data set, the corona discharge has 60 images, and the suspended potential discharge has 35 images. The free metal particle discharge has 50 images, and the insulation gap discharge has 45 images. The normal sample has 130 images. Among them, 50 labeled samples are randomly chosen for training and the rest 270 labeled samples are used for testing. The classification process is repeated 10 times. The average result is calculated as the final result. The details can refer to [36].

B. EXPERIMENTS ANALYSIS
In experiments, the overall accuracy (OA) and average accuracy (AA) [36]- [38] are adopted to judge the diagnosis performance. They are widely used evaluation indexes in image classification [38]. Table II shows the classification accuracy obtained by different methods on the PRPD spectrum. OA of the SVM, VGG-16 and AlexNet models are 35.19%, 44.81%, and 45.19% respectively. It demonstrates that traditional models fail to generate outstanding classification performance with limited training numbers. The SFN-WO and SFN methods are at least 36.27% higher than the other three compared methods. It has great improvement scores in PD fault diagnosis. Specifically, for the corona discharge class, the classification accuracy has improved from 42.27% to 82.08%. For the free metal particle discharge class, the classification accuracy has improved from 30.23% to 87.56%. The performance of the SFN-WO method is slightly lower than the SFN method. The reason is that, by training limited PD samples, the SFN method has a more robust generalization ability in PD signal extraction, compared with the SFN-WO method. Besides, Table III illustrates the classification accuracy obtained by different methods on the PRPS spectrum. OA of the SVM, VGG-16, and AlexNet models are 35.92%, 49.42%, and 46.01% respectively. The SFN method is at least 34.28% higher than the other three compared methods.
Moreover, the influence of the number of samples for training on the diagnosis performance is discussed. Different numbers of samples are randomly selected from data sets to constitute the training and test sets. Fig. 10 presents the variation tendencies of diagnosis accuracies for PRPS spectrums. Diagnosis results of the proposed SFN method are based on different spectrums and the fusion scheme. The training numbers of each class is varying from 10 to 20. For SVM, VGG-16 and AlexNet methods, the diagnosis performance has slight improvement. For SFN methods, the diagnosis performance has improved from 83.70% to 87.56%.
Furthermore, for the proposed SFN method, the diagnosis results from the single spectrum (PRPS or PRPD) also are compared with the SFN method. SFN-PRPS and SFN-PRPD are served as two compared methods. Fig. 11 shows the diagnosis accuracy of different methods. The proposed method can obtain the highest detection accuracy, 88.38%. It is higher than other diagnosis methods, 83.70%, and 85.56%. The reason is that two spectrums can capture characteristic information from different respects. The diagnosis result reflects its statistical characteristic from the UHF signal. The proposed SFN method can fuse two diagnosis results, and avoid some miss classification with the single spectrum. Therefore, the VOLUME 4, 2016      fused results can produce a more robust diagnosis result with limited training numbers.

V. CONCLUSION
This paper has introduced a siamese fuse network for partial discharge diagnosis with limited training numbers. First, the UHF sensor generates two spectrums from power equipment, including PRPD and PRPS. Based on a few-shot learning strategy, a support set is constructed. There are four different types of PD samples and a normal sample. Then, two siamese networks are employed to estimate similarity scores between a test sample and support set samples. One network measures similarity scores in PRPS, and the other measures similarity scores in PRPD. Last, a simple and effective decision fusion technique fuses diagnosis results. Experiments performed on the PD data sets demonstrate that our method outperforms its counterparts in terms of classification accuracy with limited training numbers. The proposed method could still be applied in other fields (such as insulator fault detection, tree barrier detection, and mountain fire detection). Thus, designing the proposed method for other applications is a challenging but interesting future work.