Learning matrix profile method for discord-based attribution of electricity consumption pattern behavior

Abstract This paper frames itself in analyzing complex electricity load measurements to attribute the consumption behavior to a specific entity. Attribution may be seen as the first step in several grid-based activities, like energy management, privacy, and identification of illicit activities. This work proposes and tests a novel approach that utilizes consumption discords as the analysis carrier for attributing multiple load patterns to specific consumers. The discord-driven analysis is performed utilizing the synergism of the Matrix Profile with each of two well-established supervised learning classification methods: the K-Nearest-Neighbor and Support-Vector-Machine. The proposed approach is applied in attributing electricity consumption pattern behaviors to a set of academic institutions that are comprised of multiple academic units. Notably, multiple units within the same entity exhibit different consumption behavior, thus imposing a high challenge in attributing the unit’s consumption behavior to its entity of origin when there is a variety of targeted entities. Obtained results demonstrate that the load discord-based MP-KNN and MP-SVM combinations provide higher identification accuracy than the state-of-the-art method of performing supervised classification with a feed-forward artificial neural network of the full load patterns. The identification accuracy of the proposed method outperformed the ANN load classification by approximately 13%.


Introduction
From a data standpoint, electricity load measurements may consist of a useful source of information utilized in a wide spectrum of cases.Of special policy interest are the cases where electricity profiles may be used for making insightful inferences about the daily activities and behavior associated with electricity consumers.For instance, the characterization of the operation of a facility as legit or not, electricity theft, or the identification of a hidden consuming unit within specific premises.
Notably, the plethora of existing data mining and machine learning (ML) tools accommodate the implementation of various strategies in the analysis of load patterns for extracting insightful information (Capozzoli et al., 2016;Molina-Solana et al., 2017).For instance, ML analytics enable the realization of various energy management systems and diagnostic and optimization strategies at the consumer level by extracting and classifying normal and anomalous behaviors or enhancing energy savings and sustainability in smart buildings (Yu et al., 2016).
On the one hand, the advent of smart grid technologies has equipped consumers with new capabilities-e.g., generation from renewable, selling off excess energy-but on the other hand, the consumption behavior has significantly altered.Remarkably, consumers have significantly departed from the traditional model profiles and thus, they exhibit an increasingly complex behavior (Alamaniotis et al., 2019).However, this complex behavior may mask simple but illicit consumption activities.An example may be the illegal operation of a commercial building that has not been licensed and appears as a "resident".A more general example states that the identification process via electricity load may drive authorities to uncover illegal activities that impose a threat to national security (Alamaniotis & Tsoukalas, 2016).The identification challenge only increases when referring to consumption entities comprised of multiple smaller consuming unitsi.e., the entity consumption may be broken down into multiple smaller ones.
It should be noted that attribution of complex multiple load measurements to a specific consumer provides a way to identify the consumer and verify or infer its consumption activities (that may be multiple ones).Of particular interest is the attribution of complex behaviors in a list of diverse consumers such as the academic campuses; typically each campus is comprised of various buildings-an example of it is given in Figure 1 -and overall they present different daily load curves of high complexity.Therefore, robust technologies for discovering, extracting, and classifying usual/unusual complex grid participant load profiles need to be developed.In the field of energy systems, the most applied data mining-based approaches are based on three main methods: i) association rule mining (ARM), ii) clustering analysis, as well as iii) approaches based on the discovery of similar behavioral patterns (Zhao et al., 2020).Driven by the recent concerns and efforts in identifying consumers via load data, a new method for efficiently attributing complex consumption patterns based on load discords extracted from load profiles and its application to academic institutes is presented in the current work.
Thus, the main contributions of the paper are the following: • introduction of a novel consumption attribution method utilizing load discords (i.e., utilizing load anomalies) instead of full pattern behavior; • development of a new intelligent system that combines the Matrix Profile (MP) technique with supervised learning tools; • application of the novel approach in real-world data taken from multiple electrical consumption patterns of various units within academic institutes.
It should be noted that from a computational point of view, the time and complexity Matrix Profile is the most suitable method to be used in time-series data mining because it offers fast (very short processing times) and highly reliable results (i.e., very few false-positives) given that it is less prone to load data variability (Nichiforov & Alamaniotis, 2021).
The rest of the paper has the following structure.Section II presents the basic background of the utilized methods for load pattern attribution and analysis.Section III discusses the developed approach, while section IV presents the results obtained from discord-based attribution together with the benchmark results.At last, section V concludes the paper and presents its main points and conclusions.

Related work
The current work is focused on attributing complex behavior to specific consumers based on the discords found in their load patterns.It should be noted that load data have been used for attribution and identification purposes in various capacities serving different aims.
A few works focus on load pattern identification having as a goal the improvement of building energy management.In (Panapakidis et al., 2014), the authors focus on pattern recognition in the load curve analysis of buildings.They propose a comprehensive methodology based on clustering algorithms to analyze and classify the behavior of electricity consumption in nine buildings from the Aristotle University of Thessaloniki (AUTH) campus in Greece.The purpose of their research is to acquire useful information on electricity behavior and identify opportunities for improving the energy efficiency and operation of the buildings.In (Qi et al., 2017), a two-fold method that extracts and identifies load patterns in data obtained from 800 different power consumers is presented.Initially, the fuzzy C-means clustering (FCM) algorithm is used for grouping the loads, which share a similar curve shape, into a cluster.Next, the classification and regression trees (CART) method is employed to classify the patterns in one of the seven consumption behavior patterns.The authors claim that the method can efficiently recognize the load behavior of recently connected power system consumers for which no prior information is given.
Furthermore, several works focus on introducing methodologies that perform anomalous pattern identification.In (Zhao et al., 2020), is presented a group of such methods utilizing supervised/ unsupervised data mining-based methods.In particular, these methods are utilized for anomaly detection in load patterns, consumer pattern identification, fault detection, and diagnosis in energy systems in buildings.The authors present the advantages and disadvantages of the methods concluding with the idea that the available techniques are not fully developed and there is a need for developing data mining-based methods that are universal, automatic, and knowledge-driven.
In (Capozzoli et al., 2018), the authors' research is concentrated on learning load patterns as well as detecting anomaly detection to improve energy management in smart buildings.They propose a methodology that combines data reduction and transformation, and machine learning tools to detect unusual patterns.Their tests are conducted on two different case studies.The case studies refer to the total electrical energy consumption of i) a town hall in Spain and, ii) a part of the University Politecnico di Torino campus in Italy.Another piece of work by the same authors presented in (Piscitelli et al., 2021) focuses on anomaly detection in academic buildings.Specifically, they propose a novel methodology for developing a reference model to detect unusual daily electrical energy consumption patterns in a university campus in Turin, Italy.A more comprehensive recent work focused on the identification of daily electricity usage patterns and anomaly detection in building electricity consumption data.The work presented in (Liu et al., 2021) discusses an ensemble of clustering methods for identifying daily electricity consumption patterns in various buildings.In this methodology, the authors attained to detect anomalous load profiles by analyzing the whole building data.
Other techniques include the use of data-driven methods in identifying embedded load profiles.In (Bourdeau et al., 2021), results on academic buildings' daily electric load profile classification are presented.The authors implemented the K-means algorithm in three ways: feature-based clustering with Manhattan distance, Euclidean distance clustering using electric daily load profile time-series, and Dynamic Time Warping method.The implementations are tested on load time series taken from 14 buildings located on a university campus in Paris.Notably, their method is not applied to the whole building identification, but exclusively to load profile classification.In (Alamaniotis & Tsoukalas, 2016), a new data-driven anticipatory system utilizing fuzzy logic and Gaussian processes is introduced for identifying excess undeclared consumption.This method focuses on analyzing load data for security purposes.
From the aforementioned literature, we observe that one promising research direction in analyzing complex load data and attributing their behavior to specific consumers encompasses the development of advanced smart data mining techniques that can efficiently extract in-depth information, e.g., unusual behavior identification in load patterns.It should be noted that most of the existing literature focuses on load pattern identification and classification of consumers in one of the known predetermined classes.In the current research, we fill the gap of attributing a complex behavior to a specific consumer.To that end, the Matrix Profile synergism with supervised learning is presented as an advanced data mining technique for consumption behavior attribution with the validity of our work being demonstrated in the analysis of complex academic consumption patterns.

Method
The proposed consumption behavior attribution method is comprised of two steps which are illustrated in Figure 2. The first step implements discord extraction by processing the load data, while the second step performs consumer identification utilizing a supervised classification model.
Initially, the pattern discord extraction is performed by computing the Matrix Profile on the available load time series, and subsequently by extracting the top k discords (i.e., anomalies) from them.Next, the identified discords are used as input to the supervised learning classification model, i.e., the K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) classifiers being utilized in the current work.The purpose of the method is to efficiently attribute the complex consumption behavior to an entry of a list of known consumers.In the current work, the list of complex consumers is populated with various academic institutions.

Matrix profile
Matrix Profile is a data mining technique that was first introduced in (C.C. M. Yeh et al., 2016) in 2016.The technique adds value to the research community by implementing a significant feature selection method in time series.The most illustrative applications of this technique encompass data visualization analysis, chain discovery, the discovery of similar patterns among time series (i.e.motif discovery), and also the identification of unusual sequences/anomalies-i.e.discord discovery-in data.It should be emphasized that Matrix Profile attracts more attention within the research community with the list of applications growing with time.Some appreciated features of the method are namely, scalability, dimension reduction, reduced training time, and simplicity to use as it requires only one parameter to be tuned, i.e. the MP window length.
According to (M. C. C. Yeh, 2018), Matrix Profile provides a new time-series Q 2 R nÀ mþ1 taken by storing the z-normalized Euclidean distance between each subsequence i.e. window T m of an initial time-series T 2 R n and its nearest neighboring window where m 2 N represents the window length and n 2 N the time-series length.As mentioned earlier, the Matrix Profile of a time series is built upon the use of the z-normalized Euclidean distance whose form is given by: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where X; Y 2 R m are two subsequences, μ X ; μ Y represent their mean and σ X ; σ Y their standard deviation respectively, as shown below: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Given that the MP technique has been proven efficient in discord discovery for multivariate time series, i.e. discord detection (Guo et al., 2003), in this paper we apply it to discord detection in the case of complex consumption data.Considering four subsequences of length m, i) T a , ii) non-selfmatch with T b , and iii) T c , iv) non-self-match with T d , then T a is considered a top discord if: where dist is the z-normalized Euclidean distance function and a�b; andc�d.
Overall, to make the operation of Matrix Profile clear, a graphic representation is given in Figure 3 (Nichiforov & Alamaniotis, 2021).

Learning methods
In the second step of our approach, supervised classification is performed by separately utilizing the KNN and SVM algorithms, i.e., two approaches are implemented: MP-KNN and MP-SVM.Even if artificial neural networks prevail in the pattern recognition field, K-Nearest Neighbours and Support Vector Machine models are equally known for being successfully applied to various classification problems.
KNN is a standard supervised machine learning algorithm for pattern recognition having as advantages its simplicity and nonparametric character.In classification problems, KNN considers the k nearest neighbors when predicting a class label by assigning the datapoint to the class that the majority of the k neighbors belong to (Cover & Hart, 1967).In this way, the labeling is done by considering that the unknown data is more likely to be like the majority of its neighboring points than the rest.
Considering a training dataset D 2 R n , with labeled training examples p; q ð Þ 2 D, the purpose is to find the link between p and q.KNN algorithm focuses on finding a function able to predict the identical output for an unknown observation: The algorithm computes the distance between each data point in the training dataset and the unknown incoming data.The default distance metric is the Euclidean distance whose form is given by: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where p; p 0 represent two data points.
Once the distances are computed, the algorithm evaluates the probability of the points being similar to the training data (IPanapakidis et al., 2014): where A 0 is the set of K-nearest observations and I(q i = j) is a variable that indicates value '1' if a given observation (p i , q i ) E A 0 is a member of class j and value '0' otherwise.After estimating these probabilities, KNN assigns the observation p 0 to the class in which the previous probability is the highest (Agarwal & Poornalatha, 2021).
Often, in the literature, KNN classification is described as a "majority voting" method which refers to a value of > 50% for making a decision when the problem is a binary class one.However, in multi-class problems, the majority rate is not required to be over 50% but it may become lower.For instance, in a four-class problem, a frequency of >25% may be enough to assign a class label (Murphy, 2012).SVM models are part of the supervised learning library and are widely used in a variety of classification problems.The main principle of the SVM model is to identify the decision boundary which separates the classes while providing the minimum possible misclassification error.This means that the objective is to find a separating curve e.g.hyperplane, that maximizes the segregation of the data points to their potential classes in an n-dimensional Euclidean space.The position of the hyperplane is determined by a subset of the training data known as the support vectors.Visually, the support vectors are those points that have the minimum distance to the determined hyperplane.Furthermore, the perpendicular distance from the hyperplane to the closest data points defines a space called the margin of the classifier.In general, the SVM framework is defined as the optimization problem of finding those support vectors that maximize the margin (Tian et al., 2012).
Analytically, the hyperplane is expressed as follows (Miller & Meggers, 2017): where w is the weight vector, x is the input vector, and b is the bias.By scaling, the support vectors have to satisfy the following equations: The distance d from a point y to a plane described by w T x + b = 0 is computed as: Hence, the distance from a support vector to the hyperplane is: and the closest distance between the classes i.e. margin (m) is: The purpose of the SVM model is to maximize the m which is formulated as: Furthermore, the basic idea of applying SVM models to multi-class problems contains the breaking down of the multi-classification problem into multiple binary classification problems.
Lastly, the flowcharts of the two classification models-KNN and SVM-are given in Figure 4 where their steps are highlighted.

Dataset description
The data used for experiments are taken from the Building and Urban Data Science (BUDS) Group at the National University of Singapore and are part of an open-source data collection of several nonresidential buildings (Van Benschoten et al., 2020).In the current research, there were used 336 load patterns from academic institutes, collected over one year period with a sampling time of one hour.As depicted in Table 1, among the selected time-series there were identified four academic institutes.Three of them are located in the following USA cities: New York City (NYC), Chicago, Phoenix, and the  last one is located in London, Europe.In each institute, there were identified four types of load patterns: laboratories, dormitories, classrooms, and offices as shown in Figure 1.As we count the number of each of them, the remark is that the data is slightly imbalanced.Also, each time series consists of 8760 data points and Table 2 shows the data descriptive statistics (mean, standard deviation (std), minimum (min), and maximum (max) values).Overall, the goal is to match the load pattern, independently of its type, to the academic institute it comes from.
An important aspect of load pattern analysis is data preparation.To that end, the load pattern time series were preprocessed to remove the missing values and the outliers.Figure 5 presents a sampled pattern of the weekly load profile data taken from a classroom from each of the four academic institutions.

Attribution results
In the first step, the MP is applied to all of the available time series.Matrix Profile was computed based on time-series load patterns using the "MatrixProfile" library (Raschka, 2018) implemented in Python 3 and it was applied to all 336-time series.As mentioned in the previous section, the only parameter that needed to be tuned was the window length, which was set to one week (168 data points) based on the authors' prior experience.Going further, among the discords identified by Matrix Profile, a subset comprised of the top 10 discords was determined by tuning the parameter of the model that defines the exclusion zone.In particular, the exclusion zone length was set equal to 2 weeks (i.e., 336 data points), a selection made to avoid the trivial matches i.e. two or more discords correlated with the same event (a common problem in MP applications).
Figures 6 and 7 present the MP results for two of our testing datasets (in particular: two laboratories from Chicago and London, respectively).The top part of each of the figures presents the actual load patterns over one year period time length measured in kW, while the bottom part depicts the corresponding obtained Matrix Profile (more specifically the z-normalized Euclidean distance) and designates the top 10 discords.The identified discords, i.e., the top 10, are marked with red color star symbols and represent the highest relative peaks in the respective MP plot.Once the subset of 10 discords is identified, then the next step is initiated: supervised classification.Then, the identified discords from the first step are fed to the classification module (either KNN or SVM model) as the module's input.
The output of the classification model provides the label concerning the academic institutes that are part of the predetermined list.Specifically, the models map the testing data to one of the four predefined labels (one for each institute).The labels are presented in Table 3 and are correlated with the four identified campuses in our datasets.Both approaches, i.e., MP-KNN and MP-SVM, were trained with 80% of the data (presented in Table 1) and tested with the rest 20%.Firstly, the KNN model was applied using k = 35 neighbors.Figure 8 pictures the optimal k in terms of Mean Squared Error (MSE), obtained using the k-fold cross-validation technique.Secondly, the SVM model was applied and equipped with a linear kernel.The cross-validation method indicated the best soft margin constant value of SVM was found to be C = 0.1 (Nwankpa et al., 2018).
As a benchmark, a state-of-the-art ANN model was trained using the same institute load patterns.The model is a fully connected neural network with one input layer with 30 neurons, two hidden layers with 10 neurons, and one output layer with 4 neurons.Concerning the input and hidden layers, the Hyperbolic Tangent Activation Function (tanh) was used, and respectively for the output layer, the Softmax Activation Function (Kingma & Ba, 2014).Also, training of the ANN model was performed with Adam Optimizer Algorithm (Sokolova & Lapalme, 2009).
Evaluation of the classification performance was performed by employing the F1-score and the classification accuracy, (and indirectly the precision and recall) metrics (Grandini et al., 2020).Equations 17 and 8 provide the formulas of accuracy, and F1-score respectively: where the precision and recall are obtained by: with notation being as follows: TP stands for True Positive, TN for True Negative, FP for False Positive, and FN for False Negative.Figure 9 depicts the F1-score values taken for the test datasets.The reason the F1-score is computed separately for each defined institute class is that the approached method is a multiclass one and using this approach, each class's success is rated separately as if there are distinct classifiers for each class.In this figure, it can be observed that KNN and SVM models can identify academic institutes with higher accuracy as compared to the ANN model.
To have an additional performance score for each model and further confirm our observations, we computed the accuracy and single F1-score value for each case using three averaging methods (Hossin & Sulaiman, 2015).The three averaging methods are: • "macro": calculates positive and negative values globally (for the whole class), • 'micro': take the average of each class's F1 score, • 'weighted': the class F1-scores are weight averaged by using the proportionality of instances in a class as weight values.
Considering the three averaging methods, the micro average F1-score values were chosen as best describing the overall performance of the models.This is mainly because our approach is a multiclass one, and thus, micro and weighted averages mainly aggregate the contributions of all classes to compute an average metric for all of them.In contrast, the macro average treats all classes equally by computing the F1-score independently for each class, and then taking the overall average value (i.e., no weighted average based on instances is performed).
In Figure 10 and Table 4, it can be noticed that the MP-KNN and MP-SVM approaches provide the highest accuracy as compared to the ANN model.For this case study, the accuracy is~60%, implying that 6 out of 10 institutes can be successfully identified utilizing the top 10 discords in the load patterns.
In support of the previous results, Figures 11, 12, and 13 provide the normalized confusion matrix for all three classification models.A confusion matrix is a summary representation of performance results on a classification problem by providing the accuracy of detection of each class (main diagonal) and the misclassification rate (rest of the matrix) (Nichiforov et al., 2021).
The confusion matrices confirm that the proposed approach utilizing discord detection either in the form of MP-KNN or MP-SVM provides higher precision compared to the full pattern ANN attribution model.The KNN model can correctly attribute to NYC institute 78% of its patterns, while respectively to Chicago institute 60%, to Phoenix 45%, and London 50%.Likewise, the SVM model identifies correctly 81% of the patterns related to the NYC institute, 67% to Chicago, 50% to Phoenix, and 20% to the London institute.Another observation that can be extracted from the ANN confusion matrix is that the load pattern-based ANN model is not capable of identifying the academic institutes since almost all the consumption patterns are identified as belonging to the   NYC institute, which is a result explained by the complexity of the consumption patterns and the imbalanced number of training data.Thus, the MP-supervised learning approach utilizing discords also accommodated the imbalanced number of datasets in the predefined list of institutes.

Conclusion
In the current paper, we propose a new approach based on the synergism of the Matrix Profile data mining technique with supervised learning classification algorithms (specifically the KNN and SVM models were employed).The novelty of the approach encompasses the use of load discords in measured patterns as a feature to perform attribution of the complex consumption behavior (pattern) to a predefined list of specific entities.The proposed approach was applied to a set of multiple consumption data (multiple types) taken from four different academic institutes.The academic institutes are comprised of several buildings with various profiles that make the institute's overall profile appear highly complex.For benchmark purposes, we used the full pattern identification method implemented by a feedforward artificial neural network.
The results demonstrated that the proposed approach (in both of its forms) utilizing only the top 10 discords identified in the load patterns correctly attributed institutes with an accuracy of ~60%.In contrast, the identification accuracy of the full load-based ANN is ~48% exhibiting a biased recognition toward attributing the vast majority of the patterns to the NYC institute.Notably, for such complex consumption patterns, it is apparent that the full load-driven ANN is not capable of capturing the distinctive features in the institute consumption as is the case with the proposed approach.
It should be emphasized that the applicability of this novel method impacts activities like energy management performance, the identification of illicit activities as well as privacy and security issues.Attributing consumption behaviors to specific entities leads to confirmation and inference of consumers' expected and registered activities.

Figure 1 .
Figure 1.Block diagram of electricity consumption patterns of an academic institute.

Figure
Figure 2. Attribution approach steps.

Figure 5 .
Figure 5. Example of classroom weekly load profile.

Figure 6 .
Figure 6.Chicago load time series vs. Matrix Profile with top 10 yearly discords.

Figure 7 .
Figure 7.London load time series vs. Matrix Profile with top 10 yearly discords.

Figure 8 .
Figure 8. Performance of KNN for various values of k.