Noninvasive Load Identification Method Based on Feature Similarity

/e traditional power load identification is greatly restricted in application because of its high cost and low efficiency. In this paper, the similarity model is established to realize the noninvasive load identification of power by determining the feature database for the equipment. Firstly, the wavelet decomposition method and the wavelet threshold processing method are used to remove abnormal points and reduce noise of the original data, respectively. Secondly, the transient and steady-state characteristics of electrical equipment (active power and reactive power, harmonic current, and voltage-current trajectory) are extracted, and the feature database for the equipment is established. /irdly, the feature similarity is defined to describe the similarity degree of any two devices under a certain feature, and the similarity model of automatic recognition of a single device is established. Finally, the device identification and calculation of power consumption are carried out for the part of data in annex 2 of question A in the 6th “teddy cup” data mining challenge competition.


Introduction
With the emergence of various new types of power load components in an endless stream, users put forward higher requirements on the reliability, safety, economy, and stability of power system. Smart grid emphasizes bidirectional interaction with users and encourages users to participate in power management through demand response, which is inseparable from detailed control of load operation information. Since the traditional invasive load monitoring system costs a lot in time and investment and has a certain impact on the reliability of the system, it is necessary to develop an economical and effective noninvasive load monitoring and identification system. Hence, strengthening the monitoring of building power consumption is of great practical significance for energy conservation and smart grid.
Noninvasive load monitoring technology has attracted much attention from power companies and scientific research institutions since it was proposed. It is worth noting that Hart [1] established the first noninvasive appliance load monitoring system (NIALM) to develop a monitoring tool that does not affect the target or affect the target as little as possible. It can provide power companies with specific power consumption data of different electrical equipment. Li and Yu [2] further carried out research on noninvasive load monitoring and determined characteristic parameters based on fuzzy clustering results of steady-state load characteristics of electrical appliances, so as to realize noninvasive load monitoring based on differential evolution algorithm. Liang et al. [3,4] researched on a series of studies in the field of load characteristics and comprehensively introduced the basic concept, system structure, feature method, decomposition framework, system simulation application, and other aspects of noninvasive load monitoring. Cai et al. [5] calculated the similarity between the transient waveform and the fixed characteristic template in the electrical load characteristic database, established the electrical load characteristic membership matrix based on similarity, and determined the characteristic type of electrical load. Zheng et al. [6] studied the microcharacteristics of noninvasive load monitoring, established the household load characteristics database, and analyzed the load characteristics and extraction methods contained in the fundamental wave and multiple harmonics of current, voltage, and power but lacked of the specific methods to complete the noninvasive identification of electrical load of users. Huang et al. [7] employed instantaneous current and power waveforms to take the decomposed current waveforms as the characteristic values of two similar loads, which could realize the accurate identification of electrical appliances with similar current waveforms. Wu et al. [8] decomposed the sampling current to obtain the independent current generated by the start-up of electrical appliances and established the load identification algorithm of entropy value discrimination to realize the decomposition and recognition of electrical loads. In practice, the research on nonintrusive power load monitoring and decomposition mainly focuses on the optimization and improvement of electrical load feature extraction and load identification algorithm.
Noninvasive power load decomposition and monitoring refers installing a sensor at the entrance to the grid users, and the device monitors the power consumption and working condition of each or each type of electrical equipment by collecting and analyzing the total power or total current. Hence, power companies can understand the power consumption rules and usage patterns of each or every type of electrical equipment in the user's home, as shown in Figure 1.
e monitoring data of household power load provides a scientific basis for the prediction of load usage in power system and ensures the correctness of decision-making [9]. is paper takes the title A in the 6th "teddy cup" data mining challenge competition as the research background. Firstly, the transient and steadystate characteristics of the electrical equipment are extracted from the original data, the equipment feature database is established, and finally, the similarity model is established to realize the noninvasive load detection of power. e data are available at the teddy cup data mining challenge website. e data used to support the findings of this study are available at the teddy cup data mining challenge website (http:// www.5iai.com/bdrace/tzjingsai/20170921/1253.html#sHref). Table 1 shows the known equipment data and parameters.

Abnormal Points Processing.
In this paper, the wavelet decomposition W k value method is adopted to detect and distinguish abnormal points and mutation points [10]. e specific algorithm is as follows: Step 1. e fitting residuals e t and t � 1, 2, . . . were decomposed online based on two wavelet scale.
Step 2. e modulus of wavelet decomposition coefficient at two scales was calculated, and the difference value was calculated to obtain E k .
Step 3. Detection of abnormal points and mutation points. e active power data of YD1-YD11 were tested by the above outlier test method. Figure 2 shows the abnormal point test results of equipment YD4 in the period from 60 seconds to 290 seconds.

Noise Reduction Processing.
We perform data noise reduction through wavelet threshold process [11].
Wavelet noise reduction is to separate signal from noise by using the difference of noise in the time and frequency domain, so as to obtain more ideal noise reduction effect.
Let signal S(n) is the polluted noise of X(t), and its basic model can be expressed as where e(t) is noise and σ is noise intensity. After wavelet noise reduction, the processed data is obtained and then the waveform is drawn by MATLAB. Based on length, a sampling period of YD1's cycle data is taken as an example here to give the signal after noise reduction, which is shown in Figure 3.  Figure 4, the transient power waveform of electrical appliances' start-up is a typical load mark. e following part is the analysis of the implementation methods and load characteristics of transient characteristics, which contains four noninvasive load monitoring: mean current and root-mean-square, transition time of transient and multiple of impulse power (current) [12].  (1) Mean and Root-Mean-Square. To calculate the mean value of signal i(t), it is necessary to integrate the signal waveform in a period of time:

Establishment of
where T is the integral time.
(2) Root-Mean-Square. Root-mean-square represents the fluctuation based on mean value of signal. e root-meansquare of signal i(t) is used to represent the voltage of alternating current's waveform, which is defined as (3) Transition Time. Set the start time of the transient process as t ton and the end time of the transient process as t toff ; then the transition time Δt can be calculated by the following equation: (4) Multiple of Impulse Power (Current). e formula for calculating the multiple K P of impulse power (current) is as follows:   Journal of Electrical and Computer Engineering where P peak is the maximum power in the process of transient switching, P S1 is the steady-state average power before the input of electrical appliances, and P S2 is the steady-state average power after the input of electrical appliances. Applying the above introduction and the singlestate data provided in Annex 1, the obtained characteristics database of transient state is as follows.
As can be seen from Table 2, the change form of electricity load from the opening state to the stable state is various. e pure resistive load enters into the steady state directly from the start, while other loads contain pulse current and the starting time and pulse size are different. And the switching transient state of different load is different, so the transient characteristic can be used to distinguish the electrical equipment.

Steady-State Feature Extraction.
e steady-state characteristics refer to the characteristics of the electrical appliances in a stable operation state. In other words, the steady-state characteristics are the results of some characteristics analysis differences between the two stable operation states [13]. is paper will use V-I trajectory, power characteristic, and harmonic matrix.
(1) V-I Trajectory. e shape features adopted by V-I trajectory method mainly include the current span, trajectory area, absolute area, standard deviation of instantaneous resistance, curvature, slope, total area, left and right areas, asymmetry, intersection point, etc. [14]. In order to avoid the influence of voltage and current amplitude differences of different loads on the size of V-I trajectory, it is necessary to normalize the two parameters before comparing the shape features. Using the frequency data provided in annex 1, take the normalized voltage as the abscissa and the normalized current as the ordinate to draw the V-I trajectory curve of some equipment, which is shown in Figures 5 and 6.
As can be seen from the above figure, for resistive loads, such as Joyang hot pot, V-I trajectory is a straight line, while for a load with high harmonic content, such as Midea microwave, V-I trajectory contains at least one intersection point. e two kinds of trajectories differ significantly, so the V-I trajectory can be used as a distinguishing feature of electrical equipment.
(1) Current span itc, which is defined as where I is the current sequence and max(I) and min(I) represent the maximum and minimum values of the current sequence. (2) e trajectory area of the normalized V-I trajectory curve e normalized sequence value V m ′ is obtained from voltage sequence V m , which is defined as where max(V) is the maximum value of the voltage sequence and m ∈ [1, NT + ip], NT are the number of sampling points in a period, and ip is the number of preset interpolation points. e normalized value I m ′ is obtained from current sequence I m , which is defined as where max(I) is the maximum value of the current sequence and I is the current sequence, the maximum , and the trajectory area is area, which is defined as area � (3) e absolute area absarea of the normalized V-I trajectory curve, which is defined as where (4) e standard deviation of instantaneous resistance D [15], which is defined as where R n � (V n ′ /I n ′ ) is the instantaneous resistance of the n-th sampling point, V m ′ is the n-th sampling point and represents the normalized voltage value, I n ′ is the n-th sampling point and represents the normalized current value, m ∈ [1, NT + ip], NT are the number of sampling points in a period, and ip is the number of preset interpolation points. R is the average value of R n .
According to the size of the power, the working state of the equipment is divided into several gears; the greater the power, the higher the gear. From device 1 to device 11, there are at most five working states, so the working state of the device is divided into five levels. e device data of one-second period is randomly selected from each running state to draw the V-I trajectory. Based on the above steps and the single-state data provided in Annex 1, the V-I trajectory feature database is obtained, and the V-I trajectory feature of gear 1 of each device is obtained (the default line represents that the device does not have this gear).
As can be seen from Table 3, the V-I trajectory characteristics of gear 1 of each equipment, especially the difference between the current span and the standard deviation of instantaneous resistance are relatively large, and the differences of the obtained track are very obvious, so the V-I trajectory characteristics can be used to distinguish electrical equipment.
(2) Power Characteristics. Active power is the total power consumed by the load during operation. If the load is pure resistance, the voltage-current waveform will always be in phase, so there is no reactive component. However, due to the presence of inductive or capacitive elements, there is always a phase shift between the current and voltage waveforms, which produces or consumes reactive power. Active power and reactive power are calculated as follows [16]:   Journal of Electrical and Computer Engineering 5 where U is the effective value of voltage when the power load is running, I is the effective value of current when the power load is running, ϕ is the power factor angle when the power load is running, and k is the number of harmonics. We draw the images of active power and reactive power of each device on the same coordinate axis and obtain the comparison diagram of active power and reactive power of each device. e comparison diagram of YD1 device and YD9 device is shown in Figure 7.
As can be seen from Figure 7, the active power of YD1 equipment is greater than the reactive power, while the active power of YD9 equipment is not always greater than the reactive power, among which the active power is less than the reactive power during a sampling period, so YD9 equipment is obviously different from other equipment in the comparison of active power and reactive power.

(3) Harmonic Matrix.
e harmonic data contains the unique characteristics of different electrical appliances. e harmonic of load voltage or current can be extracted by Fourier transform or wavelet transform and further identified the load. It should be noted that most loads produce even harmonic with small amplitude and odd harmonic with large amplitude. Low harmonic contains a large amount of information [17]. erefore, this paper selects the 2nd to 11th harmonic data to study. Calculate the amplitude of each harmonic content rate of each device, and obtain the following harmonic feature database. e data in each row is the amplitude of the kth (k � 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) harmonic content rate of each device.
It can be seen from Table 4 that resistive loads, such as incandescent lamps and kettles, produce few harmonic. While nonresistive loads, such as induction cooker, electric fan, produce rich harmonic. It can be seen that the second and third harmonic contents of YD1, YD2, YD3, YD5, YD6, and YD8 are above 90%, but the harmonic contents of YD9, YD10, and YD11 are significantly lower than 90%, which can distinguish these loads.
In this paper, the current variance of harmonic content rate of each device under different working conditions is calculated to describe the variation trend of harmonic content rate of each device under different working conditions. e default value indicates that the gear does not exist in the device. For example, device 1 cannot be switched 4th to 5th gear. e result is shown as Table 5.
As can be seen from Table 5, under the closed state, the variance of harmonic content rate of YD1, YD2, YD3, YD5, and YD6 is greater than other equipment. For one device, such as YD4, the variance of harmonic content rate is firstly small under the closed state, and then the harmonic content rate increases rapidly when switching to the first gear. In addition, the higher the gear shift is, the lower the variance harmonic content rate is, and the harmonic content rate is almost constant. erefore, the variance of harmonic content rate can be used as the identification basis.

Similarity and Weight Coefficient.
To automatically identify an unknown single device, the characteristic similarity of load mark can be analyzed [14]. Domain feature similarity S is defined as where Y x represents the eigenvector of the unknown device x. Y i is the eigenvector of device i. e larger the value of S, the higher the similarity between the unknown device x and the known device i. e similarity of load mark extracted in this paper is divided into four types of calculation, where (1/‖‖Z i − Z x ‖/‖Z i ‖‖ 2 ) represents the similarity of transient characteristic of device YD i and device YD x . Similarly, represent the similarity of V-I trajectory characteristic and harmonic characteristic of device YD i and device YD x .
H ix represents the contrast similarity of the active power reactive power, defined as the image similarity between the active power and the reactive power contrast figure of the two devices. e specific similarity calculation employs the histogram method [18]. Firstly, calculate the histogram of Finally, we calculate the contrast similarity between the active power and the reactive power of the device YD i and the device YD x . e total similarity is calculated by weight, and the weight is determined by entropy method [19]. e entropy weight coefficient [20] of each target is expressed as follows: rough the entropy value method, the weight of each feature similarity is w � (0.151, 0.342, 0.375, 0.132).   of feature similarities to obtain a total similarity of load feature similarity. e specific model is as follows:

Establishment of the Similarity Model. Load identification model based on similarity is a weighted sum of all kinds
where Z i is the eigenvector of transient of YD i , Z x is the eigenvector of transient of YD x , V i -the eigenvector of V-I trajectory of YD i , V x is the eigenvector of V-I trajectory of YD x , X i is the eigenvector of harmonic of YD i , X x is the eigenvector of harmonic of YD x , and H ix is the comparison similarity of active power and reactive power between the device to be tested and the known device.

Feature Extraction and Recognition of Unknown
Devices. By the method of V-I trajectory and harmonic matrix, the feature matching data of unknown device X 1 and X 2 are extracted as follows.
It can be seen from the characteristic data of Table 6 that when the devices X 1 and X 2 to be tested are in the first gear position, the V-I trajectory curve caused by the standard deviation characteristic of instantaneous resistance is relatively large.
As can be seen from Table 7, the unknown equipment X 1 produces few harmonics, and the unknown equipment X 2 produces abundant harmonics. It can be seen that the third and fifth harmonics content rate of X 2 nearly 50%, but the harmonic content of X 1 is less than 1%.
As can be seen from Table 8, in the closed state, the variance of harmonic content rate of X 1 and X 2 has little difference. For equipment X 1 , firstly, the variance of harmonic content rate is small in the closed state, and then the harmonic content rate slowly decreases when switching to the 1st level, and finally the harmonic content rate continues to decrease when switching to the 2nd level. For the equipment X 2 , firstly, the variance of harmonic content rate is small in the closed state, then gradually decreases with the increase of gear switch, and finally remains almost constant. rough the established model and relevant data, the calculation results of the similarity between the unknown device X 1 , X 2 , and YD1 to YD11 are as follows.     Tables 9 and 10, the similarity between the unknown device X 1 and device 8 is the highest; that is, the unknown device X 1 is device 8. e similarity between the unknown device X 2 and device 9 is the highest; that is, the unknown device X 2 is device 9.

Calculation of Real-Time Power Consumption of Unknown Equipment.
In the equipment data given in Annex 2, U, I, and PFC are the measured voltage, current, and power factor, respectively. e specific calculation formula of realtime power consumption is as follows: where U represents voltage, I represents current, and PFC represents power factor. According to the above calculation formula and the data given in Annex 2, the real-time power consumption of the unknown device is obtained. ere are some data of the realtime power consumption of the unknown device 1. Table 11 shows partial data of calculation results of realtime power consumption of unknown equipment X 1 .

Conclusion
Based on the data analysis, this paper firstly uses MATLAB to detect and distinguish the abnormal points and mutation points by using the method of wavelet decomposition W k value of the original data. Secondly, the data is transformed by wavelet noise reduction, and pretreatment of the sampled data points of each device is completed. Finally, the abnormal point detection results of a certain device are obtained, and the waveform diagram after noise reduction is drawn.
In the process of feature extraction, firstly, the transient characteristic of a single device are extracted by analyzing the preprocessed data, which includes active power, reactive power, harmonic current, and voltage-current trajectory (V-I trajectory). Secondly, the computation and extraction methods of the characteristic values of each load characteristic are given. Finally, the transient characteristic values of the equipment are obtained, containing the V-I trajectory characteristics of gears 1, 2, 3, 4 and 5, the comparison diagram of active power and reactive power of each equipment, the amplitude of kth (k � 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) harmonic content rate of the equipment, and the variance of harmonic content rate of each operating state of the equipment.
In the automatic identification of a single device, this paper identifies any single device by establishing a similarity model. Based on the load characteristics of four types extracted, a similarity-based load identification model is established. Firstly, the feature similarity is defined to denote the similarity degree of any two devices, and the weight coefficient of similarity of each feature is determined by the entropy value method. Secondly, the weighted sum of feature similarity is used to determine the total feature similarity, and the device with the highest similarity is selected to match with the unknown device. Finally, the similarity feature data between the unknown device and devices 1-11 are obtained. According to the calculation results, the unknown device X 1 is determined as device 8, and the unknown device X 2 is determined as device 9.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.