An Efficient Feature Extraction Method Based on Entropy for Power Quality Disturbance

This study explores the applicability of entropy defined as thermodynamic state variable introduced by German Physicists Rudolf clausius and also presents the concepts and application of said state variable as a measure of system disorganization. Later an entropy-based feature Analysis method for power quality disturbance analysis has been proposed. Feature extraction of a disturbed power signal provides information that helps to detect the responsible fault for power quality disturbance. A precise and faster feature extraction tool helps power engineers to monitor and maintain power disturbances more efficiently. Firstly, the decomposition coefficients are obtained by applying 10-level wavelet multi resolution analysis to the signals (normal, sag, swell, outage, harmonic and sag with harmonic and swell with harmonic) generated by using the parametric equations. Secondly, a combined feature vector is obtained from standard deviation of these features after distinctive features for each signal are extracted by applying the energy, the Shannon entropy and the log-energy entropy methods to decomposition coefficients. Finally the entropy methods detect the different types of power quality disturbance.


INTRODUCTION
Electric Power Quality (PQ) has been a topic of consideration for the last two decades. The interest in improving the power quality of electric supply with respect to the study of phenomena such as sag, swell ,momentary interruption have became a common problem in the recent years. The proper diagnosis of PQ problems requires a high level of engineering expertise. Adding to the difficulty of PQ diagnosis as that the required expert knowledge is not in any one area but rather in many area of electric power like electric drives, Rotating machines, Transformers, Power electronics, Power supplies, capacitor switching, protection, power system faults, harmonics, signal analysis, measuring instruments etc., (Dugan et al., 2002).
Recent advances in signal analysis have lead to the development of a new method for characterizing and identifying various Power quality disturbance signals. The quality of power supplies has major concern for electric utilities and end users due to the competitive electricity markets under deregulation. To study the power quality problems, it is required to monitor the voltage and current waveforms and identify the different disturbances from these measurements (He and Starzyk, 2006). After detecting the disturbance, it is required to identify the source of disturbance. Power quality problems involve voltage sag, voltage swells, notch, spike, switching transient, impulses, flicker and harmonics etc.
Voltage sag, swell is not a complete interruption of power. It is a partial change in waveform structure. When the normal voltage signal increases by 10 to 90%, it is known as voltage swell (Sushama et al., 2007). Most voltage sags do not go below 50% of the nominal voltage and they normally last from 3 to 10 cycles-or 50 to 170 msec. Voltage sag is a problem that occurs due to a fault, switching of heavy load, or starting of large motors.
As a general tool for monitoring and analyzing PQ events, the wavelet transform analysis has been used extensively. Most of the techniques have shown good efficiency in dealing with various power disturbances in both time and frequency domains. In particular the Discrete Wavelet Transforms (DWT) technique, with Multi-resolution Signal Decomposition (MSD) (Gaouda et al., 1999), provides some information to detect and localize power disturbances by means of using timefrequency domain.
Wavelet transform: A wavelet is a wave-like oscillation with amplitude that starts out at zero, increases and then decreases back to zero. It can typically be visualized as a "brief oscillation" like one might see recorded by a seismograph or heart monitor. Wavelets can be combined, using a "shift, multiply and sum" technique called convolution, with portions of an unknown signal to extract information from the unknown signal (Gaouda et al., 2002).
A wavelet transform is the representation of a function by wavelets. The wavelets are scaled and translated copies (known as "daughter wavelets") of a finite-length or fast-decaying oscillating waveform (known as the "mother wavelet"). Wavelet transforms have advantages over traditional Fourier transforms for representing functions that have discontinuities and sharp peaks and for accurately deconstructing and reconstructing finite, non-periodic and/or nonstationary signals. Wavelet transform is a transform which is capable of providing the time and frequency information simultaneously, hence giving a timefrequency representation of the signal. Wavelet transforms are classified into Discrete Wavelet Transforms (DWTs) and Continuous Wavelet Transforms (CWTs). Both DWT and CWT are continuous-time (analog) transforms. They can be used to represent continuous-time (analog) signals. CWTs operate over every possible scale and translation whereas DWTs use a specific subset of scale and translation values or representation grid (Santoso et al., 1996).

Continuous wavelet transform:
Where, x (t) is the signal to be analyzed, ψ (t) is the mother wavelet or the basis function. All the wavelet functions used in the transformation are derived from the mother wavelet through translation (shifting) and scaling (dilation or compression) (Sidney Bums et al., 1998): The mother wavelet used to generate all the basis functions is designed based on some desired characteristics associated with that function. The translation parameter τ relates to the location of the wavelet function as it is shifted through the signal. Thus, it corresponds to the time information in the Wavelet Transform. The scale parameter s is defined as |1/frequency| and corresponds to frequency information. Scaling either dilates (expands) or compresses a signal. Large scales (low frequencies) dilate the signal and provide detailed information hidden in the signal, while small scales (high frequencies) compress the signal and provide global information about the signal. Notice that the Wavelet Transform merely performs the convolution operation of the signal and the basis function. The above analysis becomes very useful as in most practical applications, high frequencies (low scales) do not last for a long duration, but instead, appear as short bursts, while low frequencies (high scales) usually last for entire duration of the signal.

Discrete wavelet transform:
The Discrete Wavelet Transform (DWT) is an implementation of the wavelet transform using a discrete set of the wavelet scales and translations obeying some defined rules. In other words, this transform decomposes the signal into mutually orthogonal set of wavelets, which is the main difference from the Continuous Wavelet Transform (CWT), or its implementation for the discrete time series sometimes called Discrete-Time Continuous Wavelet Transform (DT-CWT).
The wavelet can be constructed from a scaling function which describes its scaling properties. The restriction that the scaling functions must be orthogonal to its discrete translations implies some mathematical conditions on them which are mentioned everywhere e.g., the dilation equation. Figure 1 and 2 shows the input signal and the corresponding wavelet transformed signal.    Figure 3 and 4 shows the sinusoidal signal having two different frequency components at two different times and the corresponding continuous wavelet transform of the sinusoidal signal. The frequency axis in plot is labeled as scale. It should be noted that the scale is inverse of frequency. That is, high scales correspond to low frequencies and low scales correspond to high frequencies. Consequently, the little peak in the plot corresponds to the high frequency components in the signal and the large peak corresponds to low frequency components (which appear before the high frequency components in time) in the signal. The parameter scale in the wavelet analysis is similar to the scale used in maps. As in the case of maps, high scales correspond to a non-detailed global view (of the signal) and low scales correspond to a detailed view. Similarly, in terms of frequency, low frequencies (high scales) correspond to a global information of a signal (that usually spans the entire signal), whereas high frequencies (low scales) correspond to a detailed information of a hidden pattern in the signal (that usually lasts a relatively short time). It might be puzzled from the frequency resolution shown in the plot, since it shows good frequency resolution at high frequencies. It should be noted however that, it is the good scale resolution that looks good at high frequencies (low scales) and good scale resolution means poor frequency resolution and vice versa.

METHODOLOGY
The proposed entropy-based features analysis approach: Entropy-based features method: Entropy as a thermodynamic state variable was introduced into physics by German physicist Rudolf Clausius in second half of 18 th century. It was originally defined as: ( 1) where, dS is an elementary change of entropy, Q is a reversibly received elementary heat and T is an absolute temperature. Of course such a definition has no sense for signal processing. However, it started a diffusion of entropy as a term into the other areas. The entropy as a measure of system disorganization appeared for the first time in connection with the First postulate of thermodynamics: "Any macroscopic system which is in time t0 in given time-invariant outer conditions will reach after a relaxation time the so-called thermodynamic equilibrium. It is a state in which no macroscopic processes proceed and the state variables of the system gains constant time-invariant values." The entropy of a system is maximal when the system has reached the thermodynamic equilibrium.
The above depicted key idea promoted entropy to a generic measure of system disorganization. From the above analysis we can see, the characteristics of the original waveform can be reflected in different scales after the MRA decomposition (Hamid and Kawasaki, 2002). Based on this observation, we can construct the feature vector to detect different kinds of PQ disturbances. This idea is shown in Fig. 5. The sampled waveform was decomposed into different resolution levels (i) according to MRA. Then the entropy of the detail information at each decomposition level (i) is calculated according to the following equation: (2) In Eq. (2), where, x = {x 1 , x 2 , . . . , x N } is a set of wavelet (detail) coefficients in wavelet decomposition from level 1 to level l. N is the total number of the coefficients at each decomposition level and E (i) is the entropy of the detail at decomposition level (i). In order to identify different kinds of PQ disturbances, the Entropy Difference (ED) at each decomposition level is calculated, which is the difference of the entropy E (i) with the corresponding entropy of the reference (normal) waveform at this level Eref (i): By observing this ED (i) feature vector at different resolution levels and following the criterion proposed later in this section, one can effectively detect, localize and classify different kinds of PQ disturbances. This method is named as Entropy Difference of MRA  : Sag signal is applied on wavelet transform a1 is first level approximation coefficient, d1, d2, d3 and d4 are details coefficient method. The major advantages of this method include two aspects. The first one is that by using this method, one can significantly reduce the dimensionality of the analyzed data. As we can see from Fig. 5, for an l levels multi-resolution decomposition, only an l-dimensional feature vector need to be observed. This is a significant reduction compared to the original sampled waveform. The second advantage is that this method keeps all the necessary characteristics of the original waveform for analysis. Different PQ characteristics are represented by the energy difference at different resolution scale, which provides an effective way for different Types of PQ detection. Figure 6a shows a normal pure sine wave (60 Hz) and its four types of typical PQ disturbances: low frequency distortion, high frequency distortion, voltage sag and voltage swell. Sampling frequency used is 5 kHz. These PQ disturbance models are based on the IEEE Standard 1159-1995 (IEEE Recommended Practice for Monitoring Electric Power Quality), which are widely adopted in the academic and industry community. Specifically, for the short duration variations, the typical duration for voltage sag is from 0.5 to 30 cycles with the voltage magnitude between 0.1 and 0.9 pu, while the typical duration for voltage swell is from 0.5 to 30 cycles with the voltage magnitude between 1.1 and 1.8 pu. For the frequency distortions, the typical spectral content for low frequency distortion is less than 5 KHz with the voltage magnitude of 0-4 pu, while for the high frequency distortions, the typical spectral content is between 0.5 and 5 MHz with the voltage magnitude of 0-4 pu (O'shea, 2002). A detailed discussion of these typical PQ characteristics can be found in detailed mathematical models. In our current study, we use the Daubechies4 (Db4) wavelets and 4 levels decomposition for analysis. Since it is well known that wavelet transform can localize the time information for PQ disturbances, we will focus on the characteristics.

ANALYSIS AND PERFORMANCE EVALUATION OF THE PROPOSED METHOD
Interested audiences can refer to paper for the detection of the beginning and ending time of the power quality disturbance. Figure 7 give the Entropy-based Features Analysis result to the signals in Fig. 6a to b, where the horizontal axis represents the decomposition level (scale) and the vertical axis is the entropy difference as defined in Eq. (3). Based on the analysis result in Fig. 7, the following criteria are proposed for detecting and classifying different kinds of PQ disturbances.

Conjecture 1:
• If the peak-value of the ED is located at scale 4 (curve c and d), it is an amplitude distortion, which means either swell or sag disturbance. Otherwise, it is a frequency distortion. • If the triangle (peak-value) is concave downward (negative ED), the distortion is a swell. If the triangle is concave upward (positive ED), the distortion is sag. • If the peak-value is at scale smaller than 6 (curve b), it is a high-frequency disturbance. • If the peak-value is at scale higher than 6 (curve a), it is a low-frequency disturbance.
One thing should be noted here is that the reference scale 6 is related to the sample frequency fs and normal frequency of the power signal (50 or 60 Hz), this will be discussed in detail in below Section.
Determination of the MDL: Using the methodology presented so far, we can detect, localize and classify different kinds of PQ disturbances based on the proposed method. However, how many levels of decomposition are enough for the proposed method to be effective? Obviously, more levels of decomposition will increase the computational cost. In this part, we aim to find the Minimum Decomposition Level (MDL) for the proposed method and modify the above evaluation criterions to be universal (Fig In MRA, since both the high pass filter and the low pass filter are half band, the decomposition in frequency domain for a signal sampled with the sample frequency fs can be demonstrated in Fig. 8 assuming the total decomposition levels for Entropy Features method is l, Table 1 shows the frequency range at each decomposition level based on the frequency decomposition shown in Fig. 8. In the proposed Entropy-based Features method, only the detail information at each decomposition level is needed. Assuming that the normal reference frequency of the power signal is freq (50 or 60 Hz) and we want to locate the energy of this reference signal at level N. According to Table 1, we have: From Eq. (4), we get: Since we need to locate the energy of the reference frequency at the center of the final result (level 6 as One thing should be noted here is that the reference related to the sample frequency fs and normal frequency of the power signal (50 or 60 Hz), this will Using the methodology presented so far, we can detect, localize and classify different kinds of PQ disturbances based on the proposed method. However, how many levels of decomposition are enough for the proposed method to ls of decomposition will increase the computational cost. In this part, we Decomposition Level (MDL) for the proposed method and modify the above Fig. 6b). filter and the low pass filter are half band, the decomposition in frequency domain for a signal sampled with the sample frequency fs can be demonstrated in Fig. 8 assuming the total decomposition levels for Entropy-based s the frequency range at each decomposition level based on the frequency decomposition shown in Fig. 8.
based Features method, only the detail information at each decomposition level is needed. Assuming that the normal reference quency of the power signal is freq (50 or 60 Hz) and we want to locate the energy of this reference signal at level N. According to Table 1, we have: we need to locate the energy of the reference frequency at the center of the final result (level 6 as shown in Fig. 7), the MDL for the proposed methods, denoted as Nmin, is found by: Nmin = 2*N For instance, in Fig. 6, sampling fs = 5000 Hz, fref = 60 Hz, according to Eq. (5), we get:

5:3808<N<6:3808
We should choose N = 6. According to Eq. (6), we get Nmin = 2 * N = 12. This is the reason why we choose 12 levels of decomposition in Fig. 7 and why the evaluation reference scale is 6 in the criteria proposed in Section below.
For any actual sampled signal, Eq. (5) the most efficient way of finding the decomposition levels for the Entropy-based Features method. For example, if we sample the same waveform in Fig. 6 by sampling frequency fs = 2500 Hz, we find only 10 levels of decomposition is necessary Eq. (5) and (6). Figure 9 gives the analysis result of 10 levels decomposition for the same signal as shown in Fig. 4. Comparing the analysis results in Fig. we can see that, 4 levels decomposition is enough for classification varies type of PQ disturbances in this situation (fs = 2500 Hz). In this way, we saved about 17% of the computational time compared to the 12 levels of decomposition. Now, we can modify the evaluation criterions in Section below Universal form: Conjecture 2: • If the peak-value of the ED is located at scale of Nmin = 2, the disturbance is either sag or swell (negative ED means swell and positive ED means sag).   Fig. 7), the MDL for the proposed methods, sampling frequency fs = 5000 Hz, fref = 60 Hz, according to Eq. (5), we We should choose N = 6. According to Eq. (6), we get Nmin = 2 * N = 12. This is the reason why we choose 12 levels of decomposition in Fig. 7 and why the evaluation reference scale is 6 in the criteria al, Eq. (5) and (6) give the most efficient way of finding the decomposition based Features method. For example, if we sample the same waveform in Fig. 6 by sampling frequency fs = 2500 Hz, we find only 10 necessary according to Eq. (5) and (6). Figure 9 gives the analysis result of 10 levels decomposition for the same signal as shown in Fig. 4. Comparing the analysis results in Fig. 10 and 5, we can see that, 4 levels decomposition is enough for sification varies type of PQ disturbances in this situation (fs = 2500 Hz). In this way, we saved about 17% of the computational time compared to the 12 levels of decomposition. Now, we can modify the below to a.
value of the ED is located at scale of Nmin = 2, the disturbance is either sag or swell (negative ED means swell and positive ED means Detail information (D) fs/2 2~ fs/2 1 fs/2 3~f s/2 2 … fs/2 n+1~f s/2 n • If the peak-value is at a scale smaller than Nmin = 2, the disturbance is high frequency distortion; otherwise, it is a low frequency distortion.
The regions of this classification are shown in Fig. 10.

Choice of a suitable wavelet:
It is well known that the choice of the appropriate wavelet is very important for all the wavelet based PQ analyses. In this part, we investigate the influence of different kinds of wavelets to the proposed Entropy-based Features method. Four commonly used wavelets, namely, Daubechies wavelets, Symlets and Coiflets wavelet are taken into account. Notice that the Haar wavelets are Daubechies wavelets with N = 1. Table 2 shows their corresponding features and Fig. 9a to d is the Entropybased Features analysis result to the previous distorted signal in Fig. 4 (fs = 5000 Hz, MDL = 12). From Fig. 9a to d we can see, that although different wavelets have some influence of the final analysis result, all these wavelets keep good resolution to classify various type of PQ disturbances. By these means, we can claim that the Entropy-based Features method has good robustness characteristics. However, it is very important to evaluate corresponding computational cost of different Wavelets. This will provide useful information for actual application, considering the tradeoff between performance and time cost.
Assume that the input discrete signal x (t) is represented by a vector of length N = 2K. The DWT using a wavelet with M filter coefficients we need to compute:  where, y A is the low pass component (approximation) and y D is the high pass component (details) and c [k] and d [k] are the low pass (approximation) filter and high pass (detail) filter coefficients as defined in the dilation equations: -k]. M is the filter length (total number of filter coefficients). Notice that when the DWT applied to discrete Signal (a vector), the computation is simply the convolution of two vectors, the signal and the filter coefficients. After a type of wavelets is chosen, the length of wavelet filter will be kept the same at all levels. For this reason, the computation complexity is mainly dependent on the length of the wavelet filters. Since the mother wavelet produces all wavelet functions (via the dilation equations) used in the transformation through translation and scaling, it determines the characteristics (such as smoothness, symmetry) of the: (10) where, wavelets means the Entropy-based Features method computational time by the specific choosing wavelets, thaar is the reference computation time of the Entropy-based Features method by Haar wavelet. Delta t% means how much more computational time should the specific choice of wavelet needed for Entropy-based Features method compared to that of Haar wavelet. Based on the above definition, Table 2 gives the results according to Eq. (10). Based on the data in Table 2, although Haar wavelet has the smallest computational cost, further investigation shows that Haar resulting Wavelet Transform. Therefore, the details of a particular application should be taken into account and the appropriate mother wavelet should be chosen in order to use wavelet transform effectively. For example, the Coif lets have important near symmetry property which is highly desired in image processing, since they correspond with nearly linear phased filters. factor under consideration, the filter length should be chosen as short as possible.
From Section above, we see that, the time complexity is proportional to the number of filter coefficients, thus wavelets with larger number of coefficient take longer time to compute. To gain empirical Understanding of the computational complexity of different wavelet families, we conducted the experiments in MATLAB 7.5 by Intel Pentium 4 1.8 GHz processor using a stopwatch timer. The computational time for Haar wavelet is chosen as the base for evaluation and we define the following variable delta t% for reference wavelet has difficulty to localize the time information for the beginning and ending time of the disturbance. Therefore, in the situation that the classification of different kinds of PQ disturbances is the only concerned issue, we can choose Haar wavelet for Entropy-based Features Method to avoid unnecessary computational cost. Otherwise, based on the data in Table 2, we recommend using wavelets with shorter filter length, such as Db2, Sym2 or Coif1 wavelets for practical application since they have relatively smaller computational cost as well as more reliable analysis performance. Of course, this kind of selection should also be based on their noise tolerance performance in actual applications, which will be discussed in detail in Section below.
Noise tolerance analysis for entropy-based features method: Since noise is omnipresent in electrical power distribution networks, we analyze whether the proposed Entropy-based Features method is still effective in a noisy environment. Two types of noise, namely Gaussian white noise and band limited spectrum noise are considered (Zhang et al., 2003).
Gaussian white noise: Gaussian white noise was considered for power quality disturbance analysis. In this section, we suppose that the noise riding on the sampled signal for Entropy-based Features analysis is white Gaussian distribution. Here we focus on the detection and Classification of different kinds of PQ disturbances under different Signal to Noise Ratio (SNR). A detailed discussion about the localization of the beginning and ending time of the disturbances in noise environment, in which an adaptive threshold of wavelet analysis is proposed to eliminate the noise influence. Figure 11 shows the Entropy-based Features analysis result for the signal in Fig. 4 in the noisy environment with SNR = 20 dB. Comparing Fig. 11 with Fig. 5 we can see, the Entropy-based Features method has good anti-noise performance and allow us to correctly classify different kinds of PQ disturbances in the noisy environment, to test the Entropy-based Features method performance in different noise environments, we use Monte-Carlo method to get the average correct classification rate when SNR varied from 20 to 50 dB. The value of SNR is defined as follows: (11) where, P s is the power (variance) of the signal and P n is that of the noise. For each PQ disturbance, 100 cases with different parameters were simulated for each choice of wavelets. The average correct classification rate according to the evaluation criteria proposed in Section above is calculated for different wavelets under different SNR.
Band limited spectrum noise: In real electrical distribution networks, noise caused by power electronic devices, control circuits, loads with solid-state rectifiers and switching power supplies are not Gaussian white noise. It has been shown that the power quality noise is defined as electrical signals with broadband spectral content lower than 200 KHz Super imposed upon the signal. In this research, we consider a band limited noise spectrum close to the fundamental frequency (60 Hz). Figures 10 shows the Entropy-based Features analysis (Db4 wavelet, SNR = 20 dB).
Result for the distorted signal combined with a band limited noise (Yang and Liao, 2001). As we can see here, the Entropy-based Features method still shows good performance in this kind of noise environment. To test the performance of different wavelets in different SNR. Within the band limited noise, Fig. 10 shows the Monte-Carlo method for the average correct classification probability for different SNR.
Based on the analysis of the experimental results in Fig. 9 and 10, we conclude that the Entropy-based Features method is not noise sensitive and performed well in the noisy environments. Although different types of noise and SNR have some influence on its performance, Entropy-based Features method always can achieve high correct classification probability as presented in Fig. 9 and 10.

CONCLUSION
This study considers an efficient feature extraction approach to classifying the PQ disturbances. In this approach, three feature vectors for each disturbance signal are firstly obtained by using different feature extraction methods and are examined for evaluating of classification performance. Then, a combined feature vector is obtained from standard deviation of features belonging to these methods. The experimental results show that the proposed combined feature vector has effectively a classification capability the PQ disturbances. Moreover, the proposed method can reduce the quantity of extracted features of disturbance signal without losing its property. Thus, the classifier system based on proposed feature extraction method needs less memory space and less computing time at both the training and testing processes. Entropy feature method can more efficiently separate the critical frequency components of a power quality signal and appears to perform better as a feature extraction tool than other signal analysis tools.