Recognition of Voltage Sag Sources Based on Phase Space Reconstruction and Improved VGG Transfer Learning

The recognition of the voltage sag sources is the basis for formulating a voltage sag governance plan and clarifying the responsibility for the accident. Aiming at the recognition problem of voltage sag sources, a recognition method of voltage sag sources based on phase space reconstruction and improved Visual Geometry Group (VGG) transfer learning is proposed from the perspective of image classification. Firstly, phase space reconstruction technology is used to transform voltage sag signals, generate reconstruction images of voltage sag, and analyze the intuitive characteristics of different sag sources from reconstruction images. Secondly, combined with the attention mechanism, the standard VGG 16 model is improved to extract the features completely and prevent over-fitting. Finally, VGG transfer learning model uses the idea of transfer learning for training, which improves the efficiency of model training and the recognition accuracy of sag sources. The purpose of the training model is to minimize the cross entropy loss function. The simulation analysis verifies the effectiveness and superiority of the proposed method.


Introduction
In recent years, with the widespread use of power electronic devices in power grid and sensitive devices in industrial production, the impact of voltage sag has gradually attracted attention in the electrical field. Accurate recognition of the source of voltage sag can help the timely formulation of the governance plan and the clear division of the responsibilities of both parties in the accident, effectively reducing economic losses and resolving related disputes [1].
At present, the research on the recognition of voltage sag sources falls into two categories: direct methods [2][3][4][5][6][7] and indirect methods [8][9][10][11][12][13][14][15][16][17][18][19]. The direct methods include the RMS method [2,3] and the deep learning method [4][5][6][7]. Indirect methods include two parts: feature extraction and pattern recognition. Common methods for feature extraction include wavelet transform [8,9], Fourier transform [10,11], S transform [12], Hilbert transform [13], and empirical mode decomposition [14], etc. The main methods of pattern recognition include neural network [15,16], support vector machine [17], principal component analysis [18], fuzzy comprehensive evaluation [19], and so on. Among them, the RMS method [2,3] is simple and easy to implement, but it is easy to produce misjudgment for complex sag situations; the deep learning method [4][5][6][7] does not rely on manual extraction of features, but the model training efficiency is low. In the feature extraction process of indirect methods, the mathematical models are mature, the features are clear. However, indirect methods are limited to • Short circuit fault is the main cause of voltage sag. Different short circuit faults can cause different sags. The voltage sag caused by three phase short circuit fault is equal in three-phase voltage magnitude. The three-phase magnitude of voltage sag caused by other short circuit types is different. Voltage swell may occur while sag occurs in an asymmetric short circuit. At the beginning and end of the voltage sag, the magnitude suddenly changes, and there is no change in the voltage magnitude during the sag. • When a large induction motor is starting, it will draw much larger current from the power supply than normal operation. The typical starting current is 5-6 times the rated working current, thus resulting in voltage sag. When the sag occurs, the three-phase voltage drops at the same time, and the sag magnitude is basically the same. There is no sudden change in the recovery process, and it is gradually recovered.

•
Because of the saturation characteristic of the core, the inrush current of transformer when switched on and off is several times the rated current, which will cause voltage sag. The initial phase angle of three-phase voltage always differs by 120 degrees, so the magnitude of three-phase sag is always unbalanced. Large transformers usually need dozens of cycles to recover because of their small resistance and large reactance. In addition, the voltage waveform of sag contains higher harmonics.
For short circuit faults, according to the fault phases, they are divided into seven types: A 1a , A 1b , A 1c , A 2ab , A 2ac , A 2bc , and A 3abc which respectively represent A-phase short circuit, B-phase short circuit, C-phase short circuit, A-and B-phase short circuit, A-and C-phase short circuit, B-and C-phase short circuit, and three phase short circuit fault. For the starting of induction motor, it is recorded as B 1 . For large transformer energizing, it is recorded as C 1 . So, there are nine types of voltage sag sources.

Phase Space Reconstruction Theory
Phase space reconstruction theory holds that the development process of any component in a dynamic system implies information of other relevant components, and the original change rule of the system can be extracted and restored by analyzing the time series data of a component.
For one-dimensional time series x = {x 1 , x 2 , ..., x N }, according to Takens' delay time embedding theory [30], one-dimensional time series is extended to high-dimensional space by using two reconstruction parameters (delay time τ and embedding dimension m): where column vectors U k (k = 1, 2, ..., m) represent coordinates of each dimension, line vectors x k (k = 1, 2, ..., M) constitute the phase points in the phase space, X is a matrix of M × m, and X k is a vector of 1 × m, τ represents delay time which is the sampling interval of time series, M = N − (m − 1)τ. These M points together constitute the phase trajectory of reconstruction phase space from voltage sag time series. In order to ensure the visibility of the reconstruction images, m can take 1, 2, 3. At the same time, the larger m is, the larger the dimension of phase space is, the more information it contains. Therefore, m = 3 is the most appropriate. When m = 3, Equation (1) becomes With C-C method [31], we determine τ = 2. So, Equation (2) becomes In this paper, the sampling interval is 1/1000 s, so τ = 2 means 2/1000 s.

Phase Space Reconstruction of Different Voltage Sag Signals
The basic frequency of the simulation model is set to 50 Hz, and the total simulation time is set to 1 s. The sampling frequency is set to 1 kHz, so the sampling point of voltage signal is 1000. The phase space reconstruction of voltage sag signal caused by single phase short circuit fault, large induction motor starting and unloaded transformer energizing is carried out. The three-dimensional coordinates are recorded as U x , U y , and U z , respectively, which equal to U 1 , U 2 , and U 3 . The reconstructed images are shown in Figures 1-3. The x-coordinate of sampling time is adjusted by omitting 0.34 s when no sag occurs to obtain better quality, while the sampling time of reconstructed image is still one second.
transformer energizing, which are caused by high-order harmonics in voltage signals. Therefore, the identification of voltage sag sources based on phase space reconstruction image has the following main characteristics: • The number of limit cycles • The size of limit cycles • The existence of strange attractors • The number of mutation trajectories In the case of short circuit fault, the duration of voltage sag is shown as overlap of phase trajectory on phase space reconstruction image, which does not affect its identification. The phase space reconstruction of voltage sag retains the complete sag information and is more intuitive and significant than the original signal waveform.  From Figures 1-3, it can be seen that the attractor of stable sinusoidal waveform is in the form of a limit cycle in phase space, and the size of the limit cycle represents the size of sinusoidal wave. For example, when the single phase short circuit fault occurs in Figure 1, the magnitude of phase A decreases, while there is a very small limit cycle in the corresponding reconstruction image; when the magnitude of phase B and C increases, there is a corresponding larger limit cycle in the reconstruction image. A larger limit cycle can be used to judge the type of fault as A-phase short circuit fault. The voltage changes slowly when the induction motor is starting. So, in Figure 2, many limit cycles are generated, and there is no sudden change between the cycles. The most special one is Figure 3: in addition to limit cycles, there are strange attractors in the reconstruction image of transformer energizing, which are caused by high-order harmonics in voltage signals. Therefore, the identification of voltage sag sources based on phase space reconstruction image has the following main characteristics: The number of limit cycles • The size of limit cycles • The existence of strange attractors • The number of mutation trajectories In the case of short circuit fault, the duration of voltage sag is shown as overlap of phase trajectory on phase space reconstruction image, which does not affect its identification. The phase space reconstruction of voltage sag retains the complete sag information and is more intuitive and significant than the original signal waveform.

VGG Network Structure
VGG is a deep convolution neural network developed by researchers from the Visual Geometry Group of Oxford University and Google DeepMind Company. VGG 16 is a classical algorithm for image classification [21].
As shown in Figure 4, VGG 16 is simple in structure, consisting of 13 convolution layers, 5 pooling layers, 3 fully connected layers, and 1 softmax output layer. All activation units of hidden layers adopt Rectified Linear Units (ReLU) function. The convolution kernels are 3×3 and the pooling kernels are 2×2. Among them, VGG 16 uses several convolution layers with smaller convolution kernels (3×3), which on one hand can reduce parameters, and on the other hand, it is equivalent to more non-linear mappings, which can increase the fitting ability of the network. The number of channels in the first layer of VGG 16 network is 64, and the number of channels in each layer of VGG16 network is doubled, up to 512 channels. With the increase of the number of channels, more information can be extracted. In addition, the convolution kernel focuses on enlarging the number of channels and the pooling kernel focuses on narrowing the width and height, which make the model deeper and wider in structure, and control the amount of calculation

VGG Network Structure
VGG is a deep convolution neural network developed by researchers from the Visual Geometry Group of Oxford University and Google DeepMind Company. VGG 16 is a classical algorithm for image classification [21].
As shown in Figure 4, VGG 16 is simple in structure, consisting of 13 convolution layers, 5 pooling layers, 3 fully connected layers, and 1 softmax output layer. All activation units of hidden layers adopt Rectified Linear Units (ReLU) function. The convolution kernels are 3 × 3 and the pooling kernels are 2 × 2. Among them, VGG 16 uses several convolution layers with smaller convolution kernels (3 × 3), which on one hand can reduce parameters, and on the other hand, it is equivalent to more non-linear mappings, which can increase the fitting ability of the network. The number of channels in the first layer of VGG 16 network is 64, and the number of channels in each layer of VGG16 network is doubled, up to 512 channels. With the increase of the number of channels, more information can be extracted. In addition, the convolution kernel focuses on enlarging the number of channels and the pooling kernel focuses on narrowing the width and height, which make the model deeper and wider in structure, and control the amount of calculation at the same time. The supervised pre-training of VGG can be divided into two processes: forward propagation and backward propagation. Forward propagation calculates input characteristics at each level as follows: where l is the current layer, x (l-1) is the input of the layer, x (l) is the output of the layer, w is the weight, b is the bias, f(·) is the ReLU function. The equation of the ReLU function is Convolution operation is a linear operation. In order to increase the non-linear ability of the neural network and consider the training complexity of the network, ReLU is used as an activation function to perform nonlinear transformation on the feature after linear operation. When backward propagation occurs, the parameters wij (l) and bi (l) of each layer are updated by batch gradient descent method. The updated formulas are as follows: where α is the learning rate, i is the i-th sample, j is the j-th mapping feature, and J(·) is the cross entropy loss function which will be mentioned in Section 3.2.

Cross Entropy Loss Function
Cross entropy is a concept in information theory [32]. It was originally used to estimate the The supervised pre-training of VGG can be divided into two processes: forward propagation and backward propagation. Forward propagation calculates input characteristics at each level as follows: where l is the current layer, x (l-1) is the input of the layer, x (l) is the output of the layer, w is the weight, b is the bias, f (·) is the ReLU function. The equation of the ReLU function is Convolution operation is a linear operation. In order to increase the non-linear ability of the neural network and consider the training complexity of the network, ReLU is used as an activation function to perform nonlinear transformation on the feature after linear operation.
When backward propagation occurs, the parameters w ij (l) and b i (l) of each layer are updated by batch gradient descent method. The updated formulas are as follows: where α is the learning rate, i is the i-th sample, j is the j-th mapping feature, and J(·) is the cross entropy loss function which will be mentioned in Section 3.2.

Cross Entropy Loss Function
Cross entropy is a concept in information theory [32]. It was originally used to estimate the average coding length. Given two probability distributions p and q, the cross entropy of p expressed by q is where p(x) is often used to describe the true distribution, q(x) is used to describe the distribution of model prediction in machine learning. The smaller the cross entropy H(p,q), the closer the two probability distributions are. Information Entropy is the expectation of all information quantities: Relative entropy is also called Kullback-Leibler divergence. If there are two separate probability distributions p(x) and q(x) for the same random variable x, relative entropy can be used to measure the difference between the two distributions.
where the smaller the relative entropy D KL (p||q) is, the closer the two probability distributions are.
Relative entropy should be used to calculate the difference of probability distribution in calculating loss. However, the Equation (9) shows that: Relative entropy = Cross entropy − In f ormation entropy (10) Since information entropy describes the amount of information needed to eliminate the uncertainty of p (the true distribution), its value should be minimum and fixed. Then, optimizing and reducing relative entropy is optimizing cross entropy, so it is easy and good to use cross entropy in machine learning.
Cross entropy loss function is as follows and the purpose of training is to minimize it: where n is the number of categories, m is the number of samples, y ji is the true value of sample j in class i, and h w,b (x ji ) is the predicted value of sample j in class i.

Attention Mechanism
Attention mechanism in deep learning simulates the attention model of human brain. At present, it is very popular and widely used in machine translation [26], speech recognition [27], image caption [28], and many other fields. Attention mechanism can select more key information from a large amount of information and suppress irrelevant information, so as to avoid the problem of over-fitting. Channel attention pays more attention to the global characteristics, while spatial attention has more prominent ability to control the local features, so the mixed attention called Convolutional Block Attention Module (CBAM), which combines channel and spatial dimension, can better depict the complete information [25]. When F is input, the attention module performs the following operations shown in Figures 5-7: where M c (·) denotes attention extraction on channel dimension as shown in Figure 6, and M s (·) denotes attention extraction on spatial dimension as shown in Figure 7. ⊗ represents the multiplication of the corresponding position elements of two matrices, W 0 and W 1 represent the connection weight in the multi-layer perceptron (MLP) model using ReLU activation function, σ represents sigmoid activation function, F avg represents the vector F after average pooling operation and F max represents the vector F after maximum pooling operation, F' avg represents the vector F' after average pooling operation and F' max represents the vector F' after maximum pooling operation.   In the channel attention module, maximum pooling and average pooling are used     In the channel attention module, maximum pooling and average pooling are used simultaneously, and the parameters are shared by the shared MLP. The result of M c (·) operation can be obtained by summing up the results of fully connected layers after MLP. The input F is multiplied by the result of M c (·) operation to get the result F' of channel attention, which is 224 × 224 × 3.
In the spatial attention module, maximum pooling and average pooling are used for F' simultaneously, and a vector with 2 channels is obtained by concatenating the two vectors of maximum pooling and average pooling. In order to get the importance of different pixels of feature maps on a plane, one convolution core is used to compress 2 channels into 1 channel. So, the convolution operation is carried out by using one 7 × 7 convolution kernel to ensure that the dimension of result of M s (·) operation is consistent with one channel, which is 224 × 224 × 1.
In the end, the result of M s (·) operation is multiplied by the result of channel attention F' to get the final vector of F". F" is consistent with F in dimension, which is 224 × 224 × 3.
Adding the attention mechanism to the convolution layer, a VGG 16 model based on attention mechanism can be obtained, as shown in Figure 8. Because this paper divides the sources of sag into nine categories, the fully connected layers are fine-tuned to 1 × 1 × 128 and the Softmax layer changes into 1 × 1 × 9.

Improved VGG Transfer Learning
Knowledge transfer is a hypothesis that can break down the same distributed samples and greatly increase the cross-domain ability of machine learning. The purpose of transfer learning [29] is to properly introduce existing knowledge into new fields, so that machines can acquire the ability to 'draw inferences from one instance to another'. Based on the idea of transfer learning, the standard VGG 16 model is used to transfer the target task to the identification of voltage sag source, which greatly improves the efficiency and accuracy of training, and reduces the dependence on sample size.
ImageNet is a data set of more than 15 million images, about 22,000 categories [33]. A standard VGG 16 network has been pre-trained on the entire ImageNet dataset, using pre-computed weights. The complete VGG transfer learning model training process sketch is shown in Figure 9. Among them, the improved VGG transfer learning model is a combined model, which consists of standard VGG 16 and attention-based VGG 16 model. Firstly, the standard VGG 16 has been trained separately on the ImageNet dataset and fixed convolution parameters. Then, using VGG 16 based

Improved VGG Transfer Learning
Knowledge transfer is a hypothesis that can break down the same distributed samples and greatly increase the cross-domain ability of machine learning. The purpose of transfer learning [29] is to properly introduce existing knowledge into new fields, so that machines can acquire the ability to 'draw inferences from one instance to another'. Based on the idea of transfer learning, the standard VGG 16 model is used to transfer the target task to the identification of voltage sag source, which greatly improves the efficiency and accuracy of training, and reduces the dependence on sample size.
ImageNet is a data set of more than 15 million images, about 22,000 categories [33]. A standard VGG 16 network has been pre-trained on the entire ImageNet dataset, using pre-computed weights. The complete VGG transfer learning model training process sketch is shown in Figure 9. Among them, the improved VGG transfer learning model is a combined model, which consists of standard VGG 16 and attention-based VGG 16 model. Firstly, the standard VGG 16 has been trained separately on the ImageNet dataset and fixed convolution parameters. Then, using VGG 16 based on attention mechanism, the input data set is trained separately, and the attention module and fully connected parameters are obtained. Finally, the standard VGG 16 and the attention-based VGG 16 model are combined for combined training. In this case, the combined training is for fine-tuning the results of separate training.

Voltage Sag Source Recognition Framework
The proposed framework for voltage sag source recognition based on improved VGG transfer learning is shown in Figure 10. The process is described as follows: In the training process of VGG 16 model based on attention mechanism, the voltage sag signal is transformed into reconstruction image according to the phase space reconstruction technology proposed in Section 2. Voltage sag sources are divided into nine categories, namely A 1a , A 1b , A 1c , A 2ab , A 2ac , A 2bc A 3abc , B 1 , and C 1 , which are used as labels of reconstruction images. Adam algorithm [34] is used to optimize the model parameters and adjust the learning rate adaptively.

Voltage Sag Source Recognition Framework
The proposed framework for voltage sag source recognition based on improved VGG transfer learning is shown in Figure 10. The process is described as follows: • Step 1: The historical data of voltage sag are read from the database. With the technology of phase space reconstruction referred in Section 2, historical reconstruction images of labeled different voltage sag sources can be generated.

•
Step 2: As training and testing data sets in this paper, the reconstruction image data in step 1 are input into the improved VGG transfer learning model in Section 3.4 for training and testing. Then, a trained improved VGG transfer learning model can be obtained.

•
Step 3: For the voltage sag signals to be identified, the corresponding reconstruction images are generated which are input into the trained model in step 2. Finally, the results of voltage sag source recognition are achieved.
In addition, the results of step 3 are added to the historical database for updating.

Voltage Sag Source Recognition Framework
The proposed framework for voltage sag source recognition based on improved VGG transfer learning is shown in Figure 10. The process is described as follows:

Data Acquisition
The simulation models shown in Figure 11 are built in MATLAB/SIMULINK, and their electrical parameters are changed to obtain various types of voltage sag signal samples at fault points or access points. For short circuit faults, the duration, short-circuit impedance and line loads are changed, and 1000 groups of samples are set for each type, totaling 7000 groups. For the starting of induction motor, the starting time, the internal parameters of the motor and the line load are changed, totaling 1000 groups. For large transformer energizing, the switching time, transformer capacity and connection mode are changed, totally 1000 groups. Because the actual data will be affected by noise, the original 9000 groups of data are superimposed with 20 dB and 10 dB white Gaussian noise, and finally 27000 groups of sample data are obtained. The formula of signal to noise ratio is SNR = 10 · lg P s P n (16) where P s is the power of sag signal, P n is the power of noise. The unit of SNR is dB. The larger the SNR is, the smaller the noise is. The basic frequency of the simulation model is set to 50 Hz and the total simulation time is set to 1 s. The sampling frequency is set to 1 kHz, so the sampling point of voltage signal is 1000. Voltage sag signal samples are reconstructed by phase space reconstruction to obtain the sample of voltage sag reconstruction image, and the data of the sag reconstruction image is used as the input data set of the model in this paper.
Four fold cross validation method was used to validate the experiment, that is, 750 groups of samples were selected as training set in each type of sag source, 250 groups of samples were used as test set, and the average of four times of test accuracy was taken as the result.
=10 lg s n P SNR P ⋅ (16) where Ps is the power of sag signal, Pn is the power of noise. The unit of SNR is dB. The larger the SNR is, the smaller the noise is. The basic frequency of the simulation model is set to 50Hz and the total simulation time is set to 1s. The sampling frequency is set to 1kHz, so the sampling point of voltage signal is 1000. Voltage sag signal samples are reconstructed by phase space reconstruction to obtain the sample of voltage sag reconstruction image, and the data of the sag reconstruction image is used as the input data set of the model in this paper.
Four fold cross validation method was used to validate the experiment, that is, 750 groups of samples were selected as training set in each type of sag source, 250 groups of samples were used as test set, and the average of four times of test accuracy was taken as the result.

Analysis of Noise Immunity for Voltage Sag Phase Space Reconstruction
Because the actual data will be affected by noise, the original 9000 sets of data are superimposed with 20dB and 10dB white Gaussian noise. Three samples of phase space reconstruction image of voltage sag in phase A with 20dB and 10dB white Gaussian noise are shown in Figure 12.

Analysis of Noise Immunity for Voltage Sag Phase Space Reconstruction
Because the actual data will be affected by noise, the original 9000 sets of data are superimposed with 20 dB and 10 dB white Gaussian noise. Three samples of phase space reconstruction image of voltage sag in phase A with 20 dB and 10 dB white Gaussian noise are shown in Figure 12. Figure 11. The simulation models of (a) short circuit fault, (b) large induction motor starting, and (c) unloaded transformer energizing in MATLAB/SIMULINK.

Analysis of Noise Immunity for Voltage Sag Phase Space Reconstruction
Because the actual data will be affected by noise, the original 9000 sets of data are superimposed with 20dB and 10dB white Gaussian noise. Three samples of phase space reconstruction image of voltage sag in phase A with 20dB and 10dB white Gaussian noise are shown in Figure 12.
Due to the strong noise, the phase trajectory is not smooth. Except for the type of motor starting, all the phase space reconstruction images are hardly affected. Therefore, the image recognition of voltage sag sources based on phase space reconstruction has a good ability to overcome the noise.

Analysis of Attention Mechanism
Attention mechanism can enhance feature extraction, which leads to better classification effect. As can be seen in Figure 13, a sample attention mechanism process of A-and C-phase short circuit type of sag is clearly shown in the form of heat map. Due to the strong noise, the phase trajectory is not smooth. Except for the type of motor starting, all the phase space reconstruction images are hardly affected. Therefore, the image recognition of voltage sag sources based on phase space reconstruction has a good ability to overcome the noise.

Analysis of Attention Mechanism
Attention mechanism can enhance feature extraction, which leads to better classification effect. As can be seen in Figure 13, a sample attention mechanism process of A-and C-phase short circuit type of sag is clearly shown in the form of heat map.
The input feature F is gray scale reconstruction images of phase A, B, and C. The channel attention, spatial attention and refined feature are shown in visual heat maps, where different colors represent different weights. In the channel attention, phase A is 0.51, phase C is 0.49, and phase B is 0.42, that is, phase A and phase C are more important than phase B. In the spatial attention, the two limit cycles which are 0.37 mean that the sag magnitude is noticed. Combining the channel attention and spatial attention, the refined feature F" shows the different weights of three phase's signal. Phase A is about 0.96 for dark red, phase C is about 0.85 for red, which means that they are the effective information for which type it belong to. Phase B is 0.65 for yellow, which plays the least role in classification. These are consistent with the characteristics of A-and C-phase short circuit fault. Through the attention mechanism, the feature extraction process is strengthened and the final classification results can be more accurate.
(c) Figure 12. Samples of phase space reconstruction of voltage sag with no noise (left), 20dB (middle) and 10dB (right) white Gaussian noise in phase A of (a) single phase short circuit fault, (b) large induction motor starting and (c) unloaded transformer energizing.

Analysis of Attention Mechanism
Attention mechanism can enhance feature extraction, which leads to better classification effect. As can be seen in Figure 13, a sample attention mechanism process of A-and C-phase short circuit type of sag is clearly shown in the form of heat map. The input feature F is gray scale reconstruction images of phase A, B, and C. The channel attention, spatial attention and refined feature are shown in visual heat maps, where different colors represent different weights. In the channel attention, phase A is 0.51, phase C is 0.49, and Attention mechanism can make the model pay more attention to the effective information, ignore the invalid information, and improve the interpretability of the model. The CBAM attention used in this paper is a combination of channel attention and spatial attention, in which channel attention can indicate which is more important to the results in phase A, B, C, and which pixels and regions in images of phase A, B, C are more important. Combining the two methods can make the model pay more attention to the areas that have great influence on the results, ignore the areas of invalid information, and improve the effectiveness and interpretability of the model.

Analysis of Classification Effect of Feature Vector
Before the output layer, VGG 16 model based on attention mechanism has a fully connected layer with 128 dimensions, which extract 128 final features of sag. These 128 values are characteristic quantities reflecting the characteristics of the reconstructed image itself, which have no practical significance. For example, as shown in Table 1, the feature data within the same class are similar, while the feature data between classes are clearly distinguished.
In order to observe the quality of automatic feature extraction in the proposed model, t-Distributed Stochastic Neighbor Embedding (t-SNE) [35] algorithm can be used to project the extracted feature vectors to three-dimensional space for observation. The three-dimensional projection of the extracted feature vectors can be drawn as shown in Figure 14. It can be seen that the projection boundaries of nine types of voltage sag feature vectors extracted by the proposed model are clear, and the distribution distance between each type is abundant.

Network Training Process and Contrastive Analysis
• Separate training of VGG 16 model VGG 16 model is used for separate training, and the pre-trained parameters of ImageNet data set are not used. As shown in Figure 15a, with the increase of iteration times, the loss of the network does not decrease, and the accuracy of the model is less than 50%, the effect is very poor. If VGG 16 model without pre-trained parameters were used to identify voltage sag sources, it would be necessary to increase sample data and training times to achieve good results, which is inefficient

Network Training Process and Contrastive Analysis
• Separate training of VGG 16 model VGG 16 model is used for separate training, and the pre-trained parameters of ImageNet data set are not used. As shown in Figure 15a, with the increase of iteration times, the loss of the network does not decrease, and the accuracy of the model is less than 50%, the effect is very poor. If VGG 16 model without pre-trained parameters were used to identify voltage sag sources, it would be necessary to increase sample data and training times to achieve good results, which is inefficient and the results are not necessarily ideal. Therefore, it is necessary to introduce the idea of transfer learning to model combined training. Without phase space reconstruction, the improved VGG transfer learning model is trained by using voltage sag signal images as the input directly. As shown in Figure 15d, with the increase of iteration times, the loss of the network decreases and the accuracy of the generated model increases gradually, but the convergence speed is slower than that in Figure 15c, and the accuracy is only about 90% which is lower than the result of reconstruction sample by phase space reconstruction technology. This is because the instantaneous waveforms of voltage sag voltage signals are dense, and the difference between the signal images of different sag sources is not obvious in the long-term signal images.

Result Analysis
In order to evaluate the recognition results of each type of sag source, Accuracy and F1 [36] are selected as the evaluation indexes. Among them, Accuracy is a single index and F1 is a comprehensive index.

2 Precision Accuracy F Precision Accuracy
where C is the number of samples correctly recognized for a certain type of voltage sag sources, T is the number of samples that are true for this type, P is the number of all samples recognized as this type, Accuracy is the ratio of the number of samples correctly recognized for this type to the number • Combined training of VGG 16 transfer learning model without attention mechanism VGG 16 without attention mechanism and standard VGG 16 are used to construct the transfer learning model of VGG 16 without attention mechanism, and combined training is carried out. As shown in Figure 15b, with the increase of the number of iterations, the loss of the network decreases gradually, and the accuracy of the generated model increases gradually which tends to be 96%. However, the stability of network loss and model accuracy is poor. Especially in the late iteration period, the over-fitting phenomenon occurs which leads to the large oscillation.

•
Combined training of improved VGG transfer learning model Attention mechanism is added to construct the improved VGG transfer learning model shown in Section 3.4, and combined training is carried out. In Figure 15c, with the increase of iterations, the loss of the network decreases gradually, and the accuracy of the generated model increases gradually, which tends to be 100%. Moreover, in the late iteration period, the attention mechanism restrains the over-fitting phenomenon, so that the stability of network loss and model accuracy is better.

•
Training using voltage sag signal image Without phase space reconstruction, the improved VGG transfer learning model is trained by using voltage sag signal images as the input directly. As shown in Figure 15d, with the increase of iteration times, the loss of the network decreases and the accuracy of the generated model increases gradually, but the convergence speed is slower than that in Figure 15c, and the accuracy is only about 90% which is lower than the result of reconstruction sample by phase space reconstruction technology. This is because the instantaneous waveforms of voltage sag voltage signals are dense, and the difference between the signal images of different sag sources is not obvious in the long-term signal images.

Result Analysis
In order to evaluate the recognition results of each type of sag source, Accuracy and F1 [36] are selected as the evaluation indexes. Among them, Accuracy is a single index and F1 is a comprehensive index.
where C is the number of samples correctly recognized for a certain type of voltage sag sources, T is the number of samples that are true for this type, P is the number of all samples recognized as this type, Accuracy is the ratio of the number of samples correctly recognized for this type to the number of samples that are true for this type, Precision refers to the ratio of the number of samples correctly recognized for this type to the number of all samples recognized as this type. The identification results of voltage sag sources are shown in Table 2. From Table 2, it can be seen that the average recognition Accuracy and F1 of the reconstruction image of the noise-free voltage sag signal are 100%, which indicates the feasibility of image recognition of voltage sag sources. With the increase of noise, the average recognition Accuracy decreases to 98.3% and the F1 decreases to 95.4%, which still remain at a high level and reflect the anti-noise ability of this method. From the point of view of practical application, the average recognition Accuracy and F1 of this method are high for the single phase short circuit fault sag source type which accounts for a large proportion in power system. Among them, the average recognition Accuracy and F1 of A 1a type is 99.4% and 98.2% respectively. The average recognition Accuracy and F1 of A 1b type is 99.2% and 98.2% respectively. The average recognition Accuracy and F1 of A 1c type is 99.5% and 97.9% respectively.
The methods of reference [4,5,15], and [17] (hereinafter referred to as method 1, method 2, method 3, and method 4) are compared with the experimental results of voltage sag source identification  Table 3 shows that the recognition accuracy of the proposed method for different voltage sag sources is relatively high. This is because the attention mechanism is added in the process of automatic feature extraction, which improves the expression ability and generalization ability of the model. From Table 4, it can be seen that the convergence rate of this method is faster, because the transfer learning method reduces the training cost and improves the training efficiency.

Conclusions
In this paper, a method of voltage sag source identification based on phase space reconstruction and improved VGG transfer learning is proposed. The effectiveness and superiority of the proposed method are verified by an example, which is mainly embodied in the following aspects: • Voltage sag signal image is reconstructed into phase space image, which not only retains the complete characteristics of sag, but also has more intuitive and concise image features. • Attention mechanism is added to VGG model to further automatically extract image features to prevent over-fitting. It has an excellent classification effect and improves the accuracy of model recognition.

•
The idea of transfer learning is introduced to train the network on the basis of other image classification results, which improves the efficiency of network training.
The basis of image recognition is image accuracy. In this paper, only voltage sag signals of one single sampling frequency are identified as voltage sag sources. How to ensure the accuracy of voltage sag reconstruction under different sampling frequencies is the next research direction.