Series Arc Fault Detection in a Low-Voltage Power System Based on CEEMDAN Decomposition and Sensitive IMF Selection

In the series arc fault detection of a low-voltage distribution network, the features of the fault current signal are easily submerged and arc fault features are difficult to be represented, which greatly increases the difficulty of fault arc detection based on current signals. To solve these problems, a series arc fault detection method combining CEEMDAN decomposition and sensitive IMF selection is proposed. In this paper, the CEEMDAN algorithm is first applied to complete decomposition of the arc current in series faults. Then, 12 feature indicators of the arc current are defined and the frequency band division of the IMF component is realized according to the kurtosis index and energy feature which are more sensitive. The time window-based feature calculation method is proposed to obtain the local features of the time scale of each high-frequency IMF component. Accurate selection of sensitive IMF components is realized by comparing feature indexes such as the variance and root mean square value. Finally, for the current feature set, the second dimension reduction is realized by the subspace transformation algorithm and the series arc fault detection is realized based on the SVM. The actual experiments show that the optimal detection accuracy of the proposed method is 91.67% and the average accuracy of 10 crossvalidation experiments is 88.33%. It shows that the proposed sensitive IMF selection method can effectively capture the fault component signals in the current and the proposed fault feature description method has good representation and discrimination ability.


Introduction
In the low-voltage distribution network, arc fault is easily caused by line insulation damage and loose terminal. Local high temperature associated with arc fault can easily lead to electrical fire accident. Arc faults are divided into series arc, parallel arc, and ground arc. When a series arc fault occurs, it is equivalent to a series of time-varying resistor in the circuit, which will easily lead to the fault current similar to the load current. Sometimes, the waveform characteristics of the fault current are difficult to distinguish from the characteristics of the nonlinear load current [1][2][3]. It is the above factors that bring great difficulties to series arc detection and make it become a hot and difficult research area of arc detection [4,5].
Arc fault detection methods can be divided into two categories: (1) arc detection based on the physical characteristics of arc light, arc sound, and temperature and (2) arc detection based on time-frequency domain analysis of the arc voltage or current signal [6,7]. Due to the randomness of the location of the arc fault, the first detection method is mostly used in electrical switchgear and its application in line arc fault detection is limited [8,9]. The time-frequency analysis method based on current and voltage signals of monitoring points has become a research hotspot in arc fault detection. The current detection method can protect the downstream branch arc fault by installing the monitoring point in the upstream of the line. Therefore, compared with the voltage detection method [10], its applicability and flexibility is stronger and more favored by researchers [11,12].
Fourier transform is widely used in early arc fault detection [13]. The essence of this method is to decompose the electrical signal into the superposition sum of multifrequency sine waves, which transform the time domain problem into the frequency domain for analysis. Fourier transform realizes the correlation between the time domain and frequency domain of electrical signals, but signal analysis can only be implemented independently in the time domain or frequency domain, not simultaneously. Due to the sound adaptability, wavelet transform can realize multiscale time-frequency analysis of the signal, which has been applied in the analysis of mechanical fault signals [14] and arc fault signals [15,16]. Wang et al. [17] proposed a hybrid time and frequency analysis and fully connected neural network-(HTFNN-) based method to identify the series AC arc fault. Firstly, the samples are roughly divided into the resistive category, capacitive-inductive category, and switching category. Then, in each category, a separate fully connected neural network is used to identify the fault and the method achieves high identification accuracy. Chu et al. [18] proposed a novel high-frequency coupling sensor for extracting the features of low-voltage series arc faults. In the method, high-frequency feature signals under different loads are collected and transformed into two-dimensional feature gray images, which are used to train the convolutional neural network to realize series arc fault detection. Experiments show that the method is stable and universal.
Hilbert-Huang transform (HHT) [19] is a typical nonlinear and nonstationary signal processing method, and its key step is empirical mode decomposition (EMD). EMD can decompose complex signals adaptively into several intrinsic mode functions (IMF), but this method has a serious mode mixing problem, which affects the performance of HHT [20]. Therefore, Wu and Huang [21] proposed the ensemble empirical mode decomposition (EEMD). By introducing Gaussian white noise with uniform frequency distribution into the signal to be decomposed, EEMD overcomes the problem of intermittent signal and avoids mode mixing. However, due to the interference of white noise, the reconstructed signal is easy to be distorted. Cheng et al. [22] proposed an enhanced periodic mode decomposition (EPMD) algorithm for accurate extraction of periodic pulses from rolling bearing composite fault signals, which effectively improved the accuracy of bearing fault diagnosis.
In 2011, Torres et al. [23] proposed the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), which further improves the accuracy and completeness of decomposition signals and effectively overcomes the problem of mode mixing. However, the CEEMDAN algorithm has not been applied in arc current signal analysis. CEEMDAN can achieve complete signal decomposition, but in arc fault detection, usually, only a few IMF components are sensitive to the arc fault and can reflect the characteristics of the fault arc. Most of the other IMF components are invalid for arc detection and even contain more interference information. Therefore, it is extremely difficult to extract fault identification features from all IMF components obtained by CEEMDAN decomposition and the interference features easily affect the accuracy of arc fault detection.
Based on the above analysis, in this paper, a series arc fault detection method based on CEEMDAN decomposition and sensitive IMF selection is proposed. The CEEMDAN algorithm is used to decompose the current signal and obtain the complete IMF components. Then, this paper proposes a strategy to automatically select the sensitive IMF from the all IMF components. In this strategy, the kurtosis index and energy feature are taken as the basis to determine the fundamental frequency boundary and achieve frequency division. For high-frequency IMF, we design a local feature extraction method based on the time window. Using the number of fundamental frequency periods as the interval, IMF in the high-frequency band is divided into some nonoverlapping time windows and the feature indexes of the signal are calculated in each time window. The sensitive IMF component with the strongest discriminability is selected adaptively based on the feature indexes such as variance and root-mean-square amplitude. After sensitive IMF selection, the local features of the best IMF component are used as the feature description of the current signal, which can be used to construct the current feature database. Finally, the subspace transformation algorithm is used to implement secondary dimension reduction for current features and the support vector machine (SVM) is used to the series arc fault detection. Experimental results show that the combination of CEEMDAN decomposition and sensitive IMF selection strategy, as well as the local feature construction method based on the time window, can effectively capture the discriminant features of the series arc, which realize the reliable detection of the arc fault.
The main highlights of the proposed method are generalized as follows: (1) To obtain complete decomposition results of fault current signals, the CEEMDAN decomposition algorithm is first applied to current signal decomposition The remainder of this paper is structured as follows: in Section 2, the CEEMDAN algorithm is described. The feature calculation methods of the current signal are illustrated in Section 3. In Section 4, the selection method of the sensitive IMF component and the series arc fault detection method are proposed. Detailed experiments and analyses are performed in Section 5. In Section 6, the conclusions are drawn.

CEEMDAN Algorithm
Mode mixing refers to the phenomenon that a single IMF component contains multiple components with different frequencies or the same frequency component is decomposed into different IMF components. Mode mixing is usually 2 Journal of Sensors caused by the intermittency of the signal. Therefore, the EEMD algorithm introduces Gaussian white noise into the signal to be decomposed, which makes the signal become continuous at different scales and alleviates the mode mixing problem. However, the EEMD algorithm cannot completely eliminate the introduced noise interference, which makes the reconstructed signal distortion. In each stage of EMD decomposition of signals, the CEEMDAN algorithm adaptively adjusts the noise coefficient to generate Gaussian noises with different SNR introduced into the signals to be decomposed, which can avoid mode mixing and eliminate the interference of false information. The algorithm steps are as follows: (1) Gaussian white noise n i ðtÞ is added to the original signal xðtÞ. The signal with added noise is xðtÞ + γ 0 n i ðtÞ, where γ 0 is the noise coefficient. EMD is used to perform I decomposition for the signal with noise. The first IMF component IMF 1 ðtÞ and the corresponding residual component r 1 ðtÞ of CEEMDAN are obtained through integration averaging: where I is the number of decomposition (2) Assume that EMD j ð⋅Þ is the j th mode function of EMD decomposition. Decompose the signal r 1 ðtÞ + γ 1 ⋅ EMD 1 ½n i ðtÞ to obtain the second IMF component of CEEMDAN: (3) Calculate the k-order residual component: where IMF k ðtÞ is the kth IMF component. EMD decomposition is performed for the k th signal r k ðtÞ + γ k EMD k ðn i ðtÞÞ until the first IMF component is obtained. On this basis, the ðk + 1Þth IMF component of CEEMDAN is calculated: (4) The above calculation steps are repeated until the residual components can no longer be decomposed and all K IMF components of CEEMDAN are obtained; the remaining residual RðtÞ is Therefore, after decomposition, the initial signal can be expressed as The CEEMDAN method can realize complete reconstruction of the original signal based on noise-assisted analysis. Gaussian noises with different SNR are adjusted adaptively by noise coefficient and introduced into the signal to be decomposed, which improves the decomposition effect of EMD effectively

Feature Calculation of the Current Signal
Learn from the feature calculation method commonly used in mechanical fault diagnosis [24,25], for each IMF component of the current signal; this paper defines 9 statistical feature indicators, as well as the energy feature, entropy feature, and energy entropy feature to form a 12-dimensional current feature vector. Assume that K IMF components are obtained after CEEMDAN decomposition and each component sequence contains N sampling points, i.e., x 1 , x 2 , ⋯, x i , ⋯x N , where i represents the ith sampling point. The 12 feature indicators are defined as follows: The mean value is The variance is The root-mean-square value is The root amplitude is 3

Journal of Sensors
The average amplitude is The peak is The kurtosis is The kurtosis index is The margin index is The energy feature is The entropy feature is where Pð⋅Þ is the probability. The energy entropy features are where After the above calculation, each IMF component containing N sampling points is transformed into a 12-dimensional eigenvector, which reflects the overall state of the signal sequence.

Fundamental Frequency Determination and Frequency
Division. The 12 feature indexes defined in this paper reflect the characteristics of IMF components at different frequencies, among which the kurtosis index and energy feature have strong sensitivity to current signal fluctuations. Therefore, based on these two indicators, this paper divides the frequency bands for all IMF components. In our study, it is found that when the kurtosis index is the minimum value and the energy feature is the maximum value, the corresponding IMF component has the smallest fluctuation range and contains the most information. This component is the fundamental frequency component of the arc current signal, which is selected as the boundary of division. Therefore, the IMF component above this frequency is the high-frequency signal and the IMF component below this frequency is the low-frequency signal.
Since the current that causes the arc fault is usually a high-frequency signal, so after frequency division, feature index calculation and feature extraction only need to be carried out for the initially selected high-frequency signals. It can not only reduce the complexity of feature index calculation, feature extraction, and fault detection but also avoid the interference of low-frequency signals and improve the accuracy of arc detection.

Local Feature Calculation and Selection of Sensitive IMF
Components. The global feature obtained from all N sampling points of IMF component data participating in index calculation can reflect the overall change of signals, which reflects the characteristics of signals of different frequency components in a macroscopic view. In the high-frequency band, the global features of some IMF components with similar frequencies are almost the same and the crucial features The variance σ 2 , root mean square value X RMS , root amplitude X r , average amplitude X a , energy feature E, and energy entropy feature H EN of each time window in the IMF component are selected as the judgment indexes If the fluctuation of the above judgment indexes in the front and back time windows (such as the jth and ðj + 1Þth windows) is less than the threshold θ, it indicates that the signals in each window of the IMF component are stable.
That is, the IMF does not contain the current information causing the fault arc.
If the judgment indexes in the front and back time windows (such as the jth and ðj + 1Þth windows) have obvious jump changes and the jump amplitude is greater than or equal to the threshold θ, it indicates that the signal in the IMF component is not stable. That is, the IMF contains the current information causing the fault arc.   5 Journal of Sensors the arc current is obtained. However, these features still contain a lot of redundant information, even interference information. Therefore, to improve the discriminant ability of the features, it is very important to perform secondary feature extraction and dimension reduction for the preliminary arc current features.
Subspace mapping is to map feature vectors from the original space to the new space by mathematical transformation, and the feature vectors in the new space have lower dimension and more significant discriminant ability. Now, the classic subspace feature extraction methods in machine learning include PCA, LDA, ICA, KPCA, and KLDA [26]. In this work, the linear subspace mapping PCA and LDA algorithms and the nonlinear subspace mapping KPCA and KLDA algorithms are used for the secondary extraction and dimension reduction of arc current features.
The support vector machine (SVM) [27] is a typical binary classifier, which has unique advantages in solving small sample, nonlinear, and high dimension pattern recognition problems. It can mine the hidden decision information of sample features to the maximum extent, and it is widely used in the field of fault diagnosis. Therefore, in this paper, the SVM is selected as the classifier of series arc fault detection.
In summary, the overall flow of the proposed algorithm is shown in Figure 1.   Figure 2, the current signal is collected by a recording device. The sampling frequency is set to 50 kHz, and the sampling length is set to 20 ms. The current dataset constructed under different loads is shown in Table 1.

Experiment and Analysis
Considering the load in the actual circuit can be divided into three types: pure resistor load, resistor-inductance load, and nonlinear load. The 800 W electric furnace is selected as the pure resistor load. When the furnace works at 400 W, it works in the state of half-wave rectification. Therefore, 800 W and 400 W electric furnaces basically cover the current characteristics of resistor loads. A computer is a nonlinear load, and its current waveform can represent most of the switching power supply loads. A microwave oven belongs to the nonlinear load with more inductance, which can represent the current characteristics of most resistor-inductance loads. The above 4 kinds of loads are commonly used household appliances with higher frequency in life and can represent most loads, with typicality. In the experiment, 10 groups of normal current data and 30 groups of fault current data are collected under each load. Figure 3 shows the comparison of current waveforms before and after the arc fault, in which the first 5 periods are fault-free signals and the last 5 periods are fault signals.  Journal of Sensors arc current data under 4 kinds of loads as training samples. Therefore, the training sample set size is 100.

Test Dataset.
The remaining 5 groups of normal current and 10 groups of fault arc current under 4 kinds of loads are taken as the test sample set, and the total number of test samples is 60. In order to ensure reliable detection results, the average accuracy of the proposed algorithm is evaluated by 10 cross tests. Experiments show that the CEEMDAN algorithm adaptively decomposes the current signals under computer load and microwave load into 14 components and realizes the detailed and complete decomposition of arc current signals in different frequency ranges, which can effectively overcome the mode mixing problem of the EMD decomposition algorithm. Observing the decomposition results of Figures 4(b) and 5(b), the component IMF10 contains 10 complete sinusoidal periodic signals whose frequency is consistent with the original current signal. Therefore, IMF10 corresponds to the fundamental frequency component. With IMF10 as the boundary, IMF1 to IMF9 are classified as high-frequency signals, and IMF11 to IMF14 are classified as low-frequency signals. The simple and clear division of each frequency band indicates that the signal decomposition is complete and no mode mixing occurs.
In addition, under computer load, high-frequency decomposition signals show that IMF4, IMF5, and IMF6 components have significant differences in the time scale, especially IMF5 and IMF6 which have good discrimination. Under microwave load, the waveforms from IMF4 to IMF7 are significantly different before and after arc occurrence,  In conclusion, the CEEMDAN decomposition strategy has a significant advantage in overcoming mode mixing and the discriminant of decomposed component signals is strong. Thus, it is feasible to select the CEEMDAN algorithm for arc current signal decomposition in this paper.

Feature Calculation of the Arc Fault Current and
Sensitive IMF Selection. In order to make a better mathematical description of each IMF component and reduce the complexity of the fault detection operation, we adopt statistical feature indexes such as the mean and variance, as well as the energy feature, entropy feature, and energy entropy feature, a total of 12 feature values as the feature description of each IMF component. Taking the decomposition results shown in Figure 5(b) as an example, 12 features of 14 IMF components are calculated and the feature matrix of 14 × 12 dimension is constructed as shown in Table 2.
The data in Table 2 show that the 12 feature indexes reflect different characteristics of multiscale IMF components, among which the kurtosis index and energy feature have strong discrimination. Therefore, by comparing the kurtosis index and energy feature of each IMF component, it can be clearly judged that IMF10 is the fundamental frequency component. This component can be used as the boundary of frequency division. IMF1-IMF9 correspond to the high-frequency signals and IMF11-IMF14 correspond   Journal of Sensors to the low-frequency signals. This conclusion is identical with that obtained from the direct observation of Figure 5(b) in Section 5.3. It further proves that the proposed frequency division strategy is effective. The current signals collected in our experiment all contain 10 periods. Therefore, according to the local feature calculation method of the time window, each IMF component in the high-frequency band can be divided into 10 windows. Then, 12 feature indexes such as the mean and variance are calculated for each window. Table 3 shows the feature calculation results of IMF5 divided into 10 time windows in Figure 5(b).
According to the data in Table 3, by comparing the indexes such as the variance, root mean square, root amplitude, average amplitude, energy feature, and energy entropy feature, we can see that the index values of the first 5 periods are significantly different from those of the last 5 periods. It shows that this IMF component contains the information that causes the arc fault, so this IMF component can be used as one of the candidates of sensitive IMF components.
To accurately select the most sensitive IMF component, in our experiment, the decomposition results under microwave oven load are taken as an example. We calculate the local features of the time window of 9 high-frequency components including IMF1 to IMF9. After comprehensive comparison of all feature values, it is found that the features of IMF4 in the first 5 periods and the last 5 periods had the largest variation amplitude and the strongest distinguishing significance. The feature values of other IMF components in the first and last 5 periods also changed to some extent, but the amplitude of change is lower than IMF4, and the differentiation degree is weak. Therefore, the IMF4 component is selected as the most sensitive component according to the principle of significant differentiation and maximum variation amplitude. Similarly, in the decomposition results under the computer load in Figure 4(b), IMF6 is the most sensitive component. In summary, in this paper, we accurately select the most discriminating sensitive IMF component, and then, the fault arc current signal is characterized by 10 time windows and 12 feature indexes. Thus, the original current signal with each set of 10,000 sampling points is converted into a vector with 120 feature values. After local feature extraction of the time window, 160 groups of normal current and fault arc current data under 4 load types finally form a current feature database with a scale of 160 × 120.

Fault
Arc Detection Experiment. The 160 × 120 dimension current feature database still contains some redundant   Journal of Sensors The influence of different parameters in KPCA algorithm on detection accuracy   Journal of Sensors information and interference information, which not only increases the complexity of arc fault detection but also affects the accuracy of detection. Therefore, to mine the features with higher discrimination, the subspace transformation method is used to perform secondary feature extraction for the current feature. The subspace transformation methods adopted in this paper include linear subspace methods PCA and LDA and nonlinear subspace methods KPCA and KLDA.
Series fault arc detection is a binary classification problem and the SVM is used as the fault detector in this paper. In the experiment, Gaussian kernel is used in the kernel function of the SVM and the optimal penalty factor of the SVM is determined to be 10 and the adjustable parameter of kernel function is determined to be 0.55 through the cross verification grid search method.
The main influencing factor of PCA and LDA is the retained feature dimension d during feature extraction. The influence factors of KPCA and KLDA include reserved feature dimension d and adjustable parameter σ in kernel function. Therefore, comparison experiments are performed to determine the parameter settings for optimal performance of each algorithm. Figure 6 shows the influence of PCA and LDA feature dimensions on detection accuracy in a cross experiment. Figure 7 shows the influence of KPCA and KLDA feature dimensions and the kernel function adjustable parameter on detection accuracy in a cross experiment.
As shown in Figure 6, the curves of arc fault detection accuracy have similar trends. When the dimension is lower than 25, the detection accuracy is generally low. At this stage, as the dimension gradually increases, the effective information contained in the feature gradually increases, so the detection accuracy increases rapidly. When the dimension is 30-45, the detection accuracy reaches a high value but there is a small fluctuation affected by the validity of the feature. When the dimension exceeds 45, interference information will be introduced into retained features, so the detection accuracy will decrease slightly. At this stage, as the dimension continues to increase, the detection accuracy is generally stable, remaining at around 80%.
In addition, because LDA is a supervised algorithm, while PCA is an unsupervised feature extraction algorithm, LDA is easier for capturing strong discriminant features than PCA. Therefore, LDA shows more excellent detection performance. In this experiment, when the feature dimension is 34, the optimal detection accuracy of PCA-SVM is 85%. When the feature dimension is 35, the optimal detection accuracy of LDA-SVM is 88.33%. After 10 crossvalidation experiments, the average detection accuracy of PCA-SVM is 77.2%. The average detection accuracy of the LDA-SVM algorithm is 81.5%. Figure 7 shows the influence of feature dimensions on detection accuracy when the kernel function adjustable parameter has different values. It is found that the effect of the feature-retained dimension on detection accuracy is similar to that of the linear subspace algorithm. The adjustable parameter of kernel function plays a key role in improving the detection accuracy. As shown in Figure 7, the optimal value of the adjustable parameter of the KPCA algorithm is 0.4 and the optimal feature-retained dimension is 30 and the highest detection accuracy is 88.33%. The optimal value range of the adjustable parameter of the KLDA algorithm is 0.3-0.4, and the optimal feature-retained dimension is 30,

12
Journal of Sensors and the highest detection accuracy reaches 91.67%. After comprehensive analysis, the kernel function adjustable parameter of the KLDA algorithm is set to 0.35. The optimal feature dimension of both algorithms is 30. In order to prove the reliability and effectiveness of the proposed algorithm, 10 crossvalidation experiments are performed and the average detection accuracy of all experiments is calculated as the final performance evaluation. The results and average accuracy of 10 crossvalidation experiments are shown in Figure 8.
In general, the performance of the nonlinear subspace transformation algorithm is better than that of the linear algorithm. Although the detection performance of the LDA-SVM algorithm is better than the KPCA-SVM algorithm in the 1st, 9th, and 10th experiments, the overall average detection accuracy of KPCA-SVM is 82.7%, which is better than that of LDA-SVM 81.5%. It shows that the feature extraction ability of nonlinear subspace transformation is better than that of linear subspace transformation and the feature set contains more nonlinear information. In addition, the KLDA-SVM algorithm has the highest average detection accuracy of 88.33%. The advantage of the KLDA algorithm lies in the guidance of supervision information. Under the same condition of retaining 30-dimensional features, the KLDA algorithm can capture features with more significant discrimination and stronger classification ability. Therefore, compared with the KPCA-SVM algorithm, the detection accuracy of the KLDA-SVM improved by about 5%.

Conclusions
In order to realize arc detection of series faults accurately and efficiently, a detection algorithm based on CEEMDAN decomposition and sensitive IMF selection is proposed. In this paper, a series arc generation platform is built and the current data of four kinds of loads are collected. Based on the CEEMDAN algorithm, arc current decomposition is implemented and a frequency division strategy is proposed to realize high-frequency signal rough selection. Then, an accurate selection strategy for the sensitive IMF component is proposed, which eliminates the interference of invalid IMF components and reduces the complexity of fault detection. A local feature construction method based on the time window is proposed to realize local feature extraction of the sensitive IMF component and enhance the contrast and discrimination of arc current features. Subspace transformation is used to extract secondary feature and the reduce dimension, and the support vector machine is used to detect the series fault arc. The optimal average detection accuracy of the proposed algorithm is 88.33%, which proves the effectiveness of the proposed algorithm and provides an important reference for fault arc detection technology and device design.

Data Availability
The dataset used in the experiment can be obtained by contacting the corresponding author.