Electrical Data Matrix Decomposition in Smart Grid

: As the development of smart grid and energy internet, this leads to a significant increase in the amount of data transmitted in real time. Due to the mismatch with communication networks that were not designed to carry high-speed and real time data, data losses and data quality degradation may happen constantly. For this problem, according to the strong spatial and temporal correlation of electricity data which is generated by human’s actions and feelings, we build a low-rank electricity data matrix where the row is time and the column is user. Inspired by matrix decomposition, we divide the low-rank electricity data matrix into the multiply of two small matrices and use the known data to approximate the low-rank electricity data matrix and recover the missed electrical data. Based on the real electricity data, we analyze the low-rankness of the electricity data matrix and perform the Matrix Decomposition-based method on the real data. The experimental results verify the efficiency and efficiency of the proposed scheme.


Introduction
As the development of smart grid and energy internet [Tsoukalas and Gao (2008)], the amount of transmitted data in real time significantly increase. Due to the mismatch with communication networks that were not designed to carry high-speed and real time data, data losses and data quality degradation may happen constantly. For this problem, the most common data recovery methods [Tu, Lin, Wang et al. (2018); Meng, Rice, Wang et al. (2018)] are used, such as mean, regression, interpolation and deep learning [Zeng, Dai, Li et al. (2018) ;Xiang, Li, Hao et al. (2018)]. According to the strong spatial and temporal correlation of electricity data which is generated by human's actions and feelings, some work takes the weather information as aid to recover electrical data via collective matrix factorization [Han, Dang, Zhang, et al. (2018)]. However, the weather information is quietly different for different locations, which can only be used to recover the electrical data of one location. Inspired by Matrix Decomposition or Matrix factorization (MF) [Tikk (2008);Hoyer (2004)], we treat the electricity data as a low-rank matrix where the two dimensional are day and user. We divide the low-rank electricity data matrix into the multiply of two small matrices and use the known data to approximate the low-rank electricity data matrix and recover the missed electrical data. Based on the real electricity data, we perform the Matrix Decomposition-based method on the real data. The experimental results verify the efficiency and efficiency of the proposed scheme. The remainder of this paper is organized as follows. Section 2 introduces the system model. Section 3 presents electrical data matrix factorization. Section 4 provides simulation results and analyses. In the end, we conclude this work in Section 5.

System model
Generally, the value of the smart meter is the the cumulative power consumption of user on each day. The minus of two consecutive values is the power consumption on one day. We take the power consumption on each day as the electricity data. The electricity data is generated by human's actions and feelings, which has a strong spatial and temporal correlation. At the same time the human's actions and feelings has periodicity. Therefore, we treat the electrical data as a matrix HN   X . In the electrical matrix, there are N uses and H days. The electrical data matrix contains the data within a H times measurement for N users. An element xij represents the power consumption of user j on the ith day, as shown in Fig. 1. The electrical matrix has many lost elements. The subset  of matrix is the known set, where the elements xij, ( , ) ij are known. As shown in Fig. 2, the recovery task is to estimate the lost or unknown element in the matrix by the spatial and temporal correlation and periodicity of the data in order to minimize the recovery error, which is usually defined as squared error x is the real value and ˆi j x is the estimated value. The low-rankness of the electricity data matrix is analyzed [Han, Dang, Zhang et al. (2018)].

Electrical data Matrix Factorization
MF techniques approximate a low rank matrix X as a product of two much smaller matrices: where e ij denotes the training error on the (i, j)-th element. Problem (5) states that the optimal U and V minimizes the sum of squared errors only on the known elements of X . we can use a simple incremental gradient descent method to find a local minimum, where one gradient step intend to decrease the square of prediction error. We compute the Having obtained the gradient, we can now formulate the update rules for uik and vkj as follows: where  is a small value that determines the rate of approaching the minimum. To avoid over fitting, a regularized MF by penalizing the square of the Euclidean norm of weights is introduced. , where F || || • represents Frobenius norm. The first two terms in the objective function are used to control the error in the matrix factorization process. The last item is the Euclidean paradigm of the factorized sub-matrix. The regularization penalty term prevents the matrix item from appearing negative values. The objective function is not conjointly convex for all variables , UV. We solve it by gradient descent. The partial derivative of the variable is used as a gradient.
Having obtained the gradient, we can now formulate the update rules as follows: where  is a small value that determines the rate of approaching the minimum. All above, the stochastic gradient descent (SDG) Algorithm of MF for recovery is shown as Algorithm 1.

Algorithm 1 SDG Algorithm of MF for recovery
Input：X，Error threshold 

UVT 4 Simulation
The real electrical data comes from Lanzhou power system company with 160 users in Jiuquan of the Lanzhou province from August 1, 2016 to August 31, 2017. Except for the lost data, we can get the available real data of 160 users in 385 days which are all known.
The real data is treated as a matrix 385 160   X . There are 160 uses and 385 days. The root mean squared error (RMSE) is used to evaluate the recovery accuracy, which is defined as: where  is set of the entries on which the values are unknown, ||  is the number of unknown entries. If the RMSE is smaller, the recovery accuracy will be higher. We compare our scheme with the Average filling (AVG) recovery on different sample ratios. The sample ratio is the ratio of the number of known elements to the number of all elements in the electrical data. The higher the ratio, the more known elements, the more information we know, and the fewer elements we need to recover. We set the sampling ratios from 85% to 97.5%, increasing at 2.5% intervals. Fig. 4 shows the RMSE of MF and Average filling (AVG) with different sample ratios. With the all sample ratio, the recovery accuracy of CP is better than that of MF. The reason is that MF uses more periodicity information. And as the sample ratio increases, the RMSE decreases because the more information is known, the more potential relationships will be provided to help improve recovery accuracy.

Figure 4:
The RMSE with different sample ratios

Conclusion
According to the strong spatial, temporal correlation and periodicity of electricity data, we treat them as a low-rank matrix where the dimensional are day and user. We perform the matrix decomposition-based method on the real data. The experimental results on real data verify the recovery accuracy efficiency of the proposed scheme.