Robust Graph Factorization for Multivariate Electricity Consumption Series Clustering

Multivariate electricity consumption series clustering can reﬂect trends of power consumption changes in the past time period, which can provide reliable guidance for electricity production. However, there are some abnormal series in the past multivariate electricity consumption series data, while outliers will aﬀect the discovery of electricity consumption trends in diﬀerent time periods. To address this problem, we propose a robust graph factorization model for multivariate electricity consumption clustering (RGF-MEC), which performs graph factorization and outlier discovery simultaneously. RGF-MEC ﬁrst obtains a similarity graph by calculating distance among multivariate electricity consumption series data and then performs robust matrix factorization on the similarity graph. Meanwhile, the similarity graph is decomposed into a class-related embedding and a spectral embedding, where the class-related embedding directly reveals the ﬁnal clustering results. Experimental results on realistic multivariate time-series datasets and multivariate electricity consumption series datasets demonstrate eﬀectiveness of the proposed RGF-MEC model.


Introduction
Recently, multivariate electricity consumption series clustering has been an important issue in machine learning fields [1][2][3]. In multivariate electricity consumption series, each instance consists of multiple time series from different sources which often contain information related to each other [4]. For example, multivariate electricity consumption series are composed of global-active-power series, global-reactivepower series, voltage series, and global-intensity series [5,6]. erefore, multivariate electricity consumption series clustering needs to analyze the relationship among these series [7,8]. In other words, the objective of multivariate electricity consumption series clustering is to discover the relationship among multiple series and divide instances into groups.
ere have been many works on time-series clustering, including univariate time-series clustering [9,10] and multivariate time-series clustering [7,11]. It is well known that k-shape [12], k-DTW Barycenter Averaging (KDBA) [13], kAVG + ED [14], and k-Spectral Centroid (KSC) [15] are effective distance-based univariate time-series clustering algorithms. k-shape [12] utilized a normalized version of cross-correlation measure in order to consider shapes of time series. KDBA [13] was a global averaging method for dynamic time warping, where a new strategy was used to reduce the length of the resulting average sequence. kAVG + ED [14] made use of an efficient indexing method to locate 1-dimensional subsequences within a collection of sequences. KSC [15] can effectively find cluster centroids with a similarity measure, which applied an adaptive wavelet-based incremental approach to clustering. Because k-shape, KDBA, kAVG + ED, and KSC have achieved great success in univariate time-series clustering, researchers considered the relationship among multiple series and extended these methods for multivariate time-series clustering (i.e., m-kShape, m-kDBA, m-kAVG + ED, and m-KSC) [16].
In addition to extending existing univariate time series methods, researchers have proposed some unsupervised feature learning works which can learn informative features of multivariate time series. For example, two-dimensional singular value decomposition [15], variable-based principal component analysis (VPCA) [17], common principal component analysis (CPCA) [18][19][20], and deep encoder networks [21,22] are used to reduce dimensionality of multivariate time series and learn informative features for clustering. He et al. [17] proposed a spatial weighted matrix distance-based fuzzy clustering (SWMDFC) model for multivariate time-series clustering. SWMDFC made use of VPCA to achieve dimensionality reduction and reduce computational consumption and then utilized spatial weighted matrix distance to compute distance among multivariate time-series data. Li [20] proposed a multivariate time-series clustering method based on CPCA, which used CPCA to construct projection coordinate features from multivariate time series and reconstructed multivariate time series from coordinate features. Franceschi et al. [21] proposed an unsupervised scalable representation learning model (USRL) for multivariate time series, which utilized a deep encoder network formed by dilated convolutions to generate informative features. However, noisy data and outliers are ubiquitous in realistic multivariate electricity consumption series data. Moreover, most existing works ignore the existence of outliers and noisy data, which can significantly affect clustering performance on multivariate electricity consumption series data.
In this paper, we develop a novel robust graph factorization framework (RGF-MEC) based on multivariate electricity consumption series, which is a joint learning framework for multivariate electricity consumption series clustering and discovery of outliers in multivariate electricity consumption series. After calculating the similarity matrix among multivariate electricity consumption series data, RGF-MEC performs robust orthogonal and nonnegative matrix factorization on this similarity graph structure. In addition, it can directly reveal clustering results without using k-means clustering that is sensitive to initialization. erefore, the main contributions of this paper can be summarized as follows: (1) is paper proposes a novel multivariate electricity consumption series framework (RGF-MEC) to simultaneously factorize the graph structure and discover outliers in multivariate electricity consumption series data (2) RGF-MEC utilizes robust matrix factorization to simultaneously learn nonnegative class-related representations and orthogonal spectral representations We perform experiments on realistic datasets to verify the proposed RGF-MEC model, and the results demonstrate that RGF-MEC is an effective multivariate electricity consumption series clustering framework e remainder of this paper is organized as follows. Section 2 introduces related works, including spectral clustering, symmetric nonnegative matrix factorization (SymNMF), and orthogonal and nonnegative matrix factorization (ONMF). Section 3 details the proposed RGF-MEC framework, including specific derivation process and complexity analysis. In Section 4, experimental results show the feasibility of the proposed RGF-MEC framework. Finally, some conclusions are given in Section 5.

Related Works
is section first explains the relationship between spectral clustering and symmetric nonnegative matrix factorization and then introduces the ONMF model. Suppose X denotes original instances, X i. denotes the ith instance, L denotes normalized Laplacian matrix of original instances, and F ∈ R N×K denotes spectral embedding. Spectral clustering decomposes eigenvalues of normalized Laplacian matrix L and constructs clusters by processing eigenvectors [23], which has the advantage of clustering on a sample space of arbitrary shape and converging to an optimal solution. erefore, the objective function of spectral clustering can be defined as where L � I − D − 1/2 SD − 1/2 , S denotes similarity graph of original instances where an element S ij is similarity between X i. and X j. , D is the degree matrix with a diagonal element D i � j S ij , and clustering results are obtained by performing k-means clustering on the embedding D − 1/2 F. Kuang et al. [24] proposed SymNMF on the basis of nonnegative matrix factorization, which can simplify the high-dimensional graph structure to low-dimensional embedding while keeping the information as unchanged as possible. e objective function of SymNMF is defined as where G denotes the symmetric high-dimensional data. Relaxing F to be orthogonal, the above problem becomes the problem of spectral clustering. Since the solution of (2) is difficult to compute, Han et al. [25] first transformed the problem of spectral clustering into the problem of symmetric nonnegative matrix factorization, and then introduced an auxiliary nonnegative variate H to approximate the orthogonal variate F: where H is a nonnegative embedding. With orthogonal and nonnegative constraints, the reconstructed graph of ONMF naturally has clear structure about clusters. It can be seen that ONMF can simplify the calculation by orthogonalizing F while ensuring clustering accuracy.

Proposed Frameworks
In this paper, a robust graph factorization framework (RGF-MEC) is proposed for multivariate electricity consumption clustering, in which graph factorization and outlier discovery are performed at the same time.
A multivariate electricity consumption series dataset M denotes the number of sequences in ME (i) , and D denotes the length of each sequence in ME (i) . RGF-MEC must first calculate the similarity graph structure G ME of ME (i) N i�1 . If ME (j) belongs to k-nearest neighbors of ME (i) , an element G ME ij of G ME is equal to and ME ′ (k+1) is the k + 1th nearest neighbor of ME (i) , otherwise an element G ME ij is equal to zero. It is known that there are many works that can reduce dimensionality of multivariate time series, such as CPCA [20], VPCA [17], and deep encoder networks [21]. RGF-MEC can reduce dimensionality of the data to obtain more effective features before calculating the graph structure. However, in order to verify robustness of the proposed model, this paper does not perform dimensionality reduction operation.
RGF-MEC takes outliers into consideration while performing matrix factorization on the similarity graph G ME . e objective function of RGF-MEC can be defined as where H ME is the nonnegative class-related embedding, F is the orthogonal spectral embedding, λ is a regularization parameter, RG n � ‖G ME n. − H ME n. F T ‖ 2 F denotes the reconstruction error of nth multivariate electricity consumption series, and ε is a parameter to filter outliers (where ε is determined according to the reconstruction error and a hypothetical ratio r of outliers, and we need to sort ‖G ME n. − H ME n. F T ‖ 2 F N n�1 from large to small and assign reconstruction error ranked in r to the parameter ε). If RG n ≤ ε , 1 RG n ≤ ε � 1, otherwise 1 RG n ≤ ε � 0. It can be seen that the objective function of RGF-MEC consists of two parts, the robust graph factorization term and the graph regularization term. In other words, H ME also satisfies the graph constraint of ME (i) N i�1 . In (6), RGF-MEC utilizes a squared F-norm to constrain robust graph factorization term, named RGF-MEC with squared F-norm (RGF-MEC F2 ). At present, researchers have proved that ℓ 2,1 -norm nonnegative matrix factorization is effective and has a mathematical meaning [26][27][28]. erefore, we also present RGF-MEC with ℓ 2,1 -norm (RGF-MEC ℓ2,1 ). e objective function of RGF-MEC ℓ2,1 can be defined as It can be seen that the main difference between RGF-MEC F2 and RGF-MEC ℓ2,1 is a norm of robust graph factorization term. We provide the learning procedure in the following two sections and then give a convergence analysis of RGF-MEC.

Learning Procedure of RGF-MEC F2
. RGF-MEC F2 can make use of the coordinate descent method to solve problem (6), where two embeddings H ME and F are updated in turn. In addition, a detailed learning procedure of RGF-MEC F2 is provided in Algorithm 1.

Fix F and Update
Setting the derivative of problem (8) equal to zeros, we have the following equation: where Λ ε is a diagonal matrix with nth diagonal element 1 RG n ≤ ε and L ME is the Laplacian matrix of G ME . erefore, the solution of problem (8) is 3.1.2. Fix H ME and Update F. If H ME is fixed, problem (6) will become Due to 1 RG n ≤ ε ∈ 0, 1 { } and F T F � I, problem (11) can be written as which is the standard orthogonal procrustes problem. Defining the singular value decomposition of G MET Λ ε H ME as USV T , the solution of problem (11) is obtained: Setting the derivative of problem (14) equal to zeros, we have the following equation: where a n � 1/2‖(G ME n. − H ME n. F T )‖ F . If a n is stationary, problem (14) can be written as min H ME ≥ 0 N n�1 a n 1 RG n ≤ ε · G ME n. − H ME Setting derivatives of problem (16) equal to zeros, we have the following equation: where Λ aε is a diagonal matrix with nth diagonal element a n 1 RG n ≤ ε . erefore, a solution to problem (14) is

Fix H ME and Update F.
If H ME is fixed, problem (7) will become Similarly, problem (19) can be written as Input: a set of a multivariate electricity consumption series dataset ME (i) N i�1 , the parameters λ, r. Output: the class-related embedding H ME . 1: Calculate the similarity graph structure G ME of ME (i) N i�1 based on (5). 2: Initialize F, H ME , and ε in turn. 3: while Not convergent do % H ME is updated based on (18) which is the standard orthogonal procrustes problem. Defining the singular value decomposition of G MET Λ aε H ME as U ′ S ′ V ′ T , a solution to problem (19) is obtained: 3.3. Convergence Analysis. In RGF-MEC F2 , two embeddings H ME and F are updated in turn. When updating H ME and F, we know that (10) (i.e., the calculation of H ME ) is the local optimal solution of (8) and (13) (i.e., the calculation of F ) is the local optimal solution of (11). erefore, RGF-MEC F2 converges to its local optimal solution. In RGF-MEC ℓ2,1 , two embeddings H ME and F are updated in turn, too. It can be seen that (18) (i.e., the calculation of H ME ) is the local optimal solution of (16) instead of (14). (14), the objective value of (14) will not increase.

Theorem 1. If (18) (i.e., the calculation of H ME ) is a solution to
Proof. Suppose e n � ‖(G ME n. − H ME n. F T )‖ 2 F and a n � 1/ �� e n √ . In RGF-MEC ℓ2,1 , the minimizing problem of �� e n √ in (14) is transformed into the minimizing problem of a n e n in (16). Suppose that a n is stationary and g(H) � λ/2 i,j ‖H ME i. − H ME j. ‖ 2 F G ij , (18) is a local optimal solution of (16). at is, n a t n e t+1 n 1 RG n ≤ ε + g H t+1 ≤ n a t n e t n 1 RG n ≤ ε + g H t , ⇔ n e t+1 erefore, the objective value of (14) will not increase and eorem 1 is proven.
It can be also seen that (21) (i.e., the calculation of F ) is a local optimal solution of (19). In other words, RGF-MEC ℓ2,1 converges to its local optimal solution, too.

Experiments
In this section, experimental results on multiple multivariate time-series datasets and a multivariate electricity consumption series dataset are used to validate the effectiveness of the proposed RGF-MEC framework.  [29], BasicMotions [30], Epilepsy [31], ERing [32], Libras [33], NATOPS [34], StandWalkJump [35], and UWaveGestureLibrary [36]. Among these multivariate time-series datasets, the maximum number of sequences is up to 24, the maximum length of these sequences is up to 2500, the maximum size of instances is 575, and the maximum number of classes is up to 25.

Performance Comparison.
We first compare the performance of RGF-MEC and other contrast algorithms on real-world multivariate time-series datasets. Performance comparisons between RGF-MEC and other contrast algorithms are reported in Tables 1 and 2, where best results are highlighted in bold. It can be seen that RGF-MEC outperforms other contrast algorithms on most real-world multivariate time-series datasets, where RGF-MEC achieves five best performances on eight real-world datasets in terms of RI or NMI. We can also make the following conclusions from Tables 1 and 2. (1) RGF-MEC is clearly better than other contrast algorithms in terms of "Mean ± Std," which demonstrates that RGF-MEC is an effective multivariate time-series clustering method. (2) e performance of NESE is best on the epilepsy dataset, which indicates that each series contains complementary information to each other. If each series is sufficient to represent the similarity between multivariate time series, then the multiview model can make  full use of information of each series and achieve good performance.

Mathematical Problems in Engineering
We then compare the performance of RGF-MEC F2 and RGF-MEC ℓ2,1 on real-world multivariate time-series datasets. Table 3 gives performance comparisons between RGF-MEC F2 and RGF-MEC ℓ2,1 . It can be seen that RGF-MEC ℓ2,1 is clearly better than RGF-MEC F2 in terms of "Mean ± Std." In other words, RGF-MEC ℓ2,1 achieves mean improvements of 0.74% RI and 2.31% NMI compared to RGF-MEC F2 . Compared with RGF-MEC F2 , RGF-MEC ℓ2,1 needs to compute weights a n � 1/2‖(G ME n. − H ME n. F T )‖ F N n�1 when updating the nonnegative class-related embedding H ME , where weights a n N n�1 can be considered as parameters that penalize larger reconstruction errors. is may make RGF-MEC ℓ2,1 more robust on most multivariate time-series datasets.
We also perform ablation studies on RGF-MEC. e objective function of RGF-MEC consists of two parts, the robust graph factorization term and the graph regularization term. Table 4 gives the performance comparisons between the ablation model and RGF-MEC, where Ablation F2 denotes RGF-MEC F2 without the graph regularization term and Ablation ℓ2,1 denotes RGF-MEC ℓ2,1 without the graph regularization term. It can be seen that RGF-MEC ℓ2,1 (resp. RGF-MEC F2 ) outperforms Ablation ℓ2,1 (resp. Ablation F2 ) on most multivariate time-series datasets, which prove that the graph regularization term enforces the nonnegative embedding H ME to satisfy the graph constraint and retain more class-related information.
Finally, we analyze the impact of parameters r and λ on the algorithm performance. Figure 1 shows the performance (i.e., NMI) change process of RGF-MEC ℓ2,1 on four multivariate time-series datasets as r increases. It can be seen that RGF-MEC ℓ2,1 takes outliers into consideration and achieves better clustering results on four multivariate timeseries datasets. Figure 2 shows the performance (i.e., NMI) change process of RGF-MEC ℓ2,1 on four multivariate timeseries datasets as λ increases. It can be seen that RGF-MEC ℓ2,1 takes the graph constraint of the nonnegative embedding (i.e., λ/2 i,j ‖H ME i. − H ME j. ‖ 2 F G ij ) into consideration and achieves better clustering results on most multivariate time-series datasets, which is consistent with the results of the ablation study. Next, as shown in Figure 3, we use t-SNE [39] to visualize the class-related embedding H ME of RGF-MEC ℓ2,1 on UWaveGestureLibrary. It can be seen that some classes have obvious outliers, such as class-4, class-6, and class-8. RGF-MEC ℓ2,1 takes outliers into consideration and performs better than other contrast algorithms.

Experiments on Multivariate Electricity Consumption
Series Dataset

Multivariate Electricity Consumption Series Dataset.
A multivariate electricity consumption series dataset of a region in China Southern Power Grid [5], denoted as the CSPG dataset, is used to evaluate the proposed RGF-MEC framework. CSPG contains four series, global-active power (household global minute-averaged active power), globalreactive power (household global minute-averaged reactive power), voltage (minute-averaged voltage), and global intensity (household global minute-averaged current intensity). In addition, it contains multivariate series for two   Table 5 shows performance comparisons between the combination containing four series and combinations containing arbitrary two series. Compared with the combinations containing arbitrary two series, RGF-MEC achieves best results on the combination containing four series. Next, we analyze the impact of parameters r, λ on CSPG. Figure 4 shows the performance (i.e., RI and NMI) change process of RGF-MEC on CSPG as r increases. It can be seen that RGF-MEC takes outliers into consideration and achieves better clustering results on CSPG. Figure 5 shows the performance (i.e., RI Figure 4: Performance (i.e., RI and NMI) change process of RGF-MEC on the CSPG dataset as r increases.

Conclusion
is paper builts an effective robust graph factorization framework for multivariate electricity consumption clustering, which performs robust matrix factorization on the similarity of multivariate electricity consumption series and obtains nonnegative class-related representations. e proposed RGF-MEC framework also enforces the nonnegative embedding to satisfy a graph constraint and retain more class-related information. In addition, RGF-MEC utilizes the squared F-norm and the ℓ 2,1 -norm to constrain the robust graph factorization term, named RGF-MEC F2 and RGF-MEC ℓ2,1 . Experimental results demonstrate that RGF-MEC F2 and RGF-MEC ℓ2,1 achieve competitive results on multiple multivariate time series datasets and a multivariate electricity consumption series dataset.

Data Availability
Eight realistic multivariate time-series datasets are deposited in a public repository (http://www.timeseriesclassification. com/dataset.php). e multivariate electricity consumption series dataset used to support the findings of this study are available from the corresponding author upon request.  Figure 5: Performance (i.e., RI and NMI) change process of RGF-MEC on the CSPG dataset as λ increases, where algorithms with NMI less than 30% cannot be displayed in this figure.