An L 1 -and-L 2 -regularized nonnegative tensor factorization for power load monitoring data imputation

As smart grid advance, Power Load Forecasting (PLF) has become a research hotspot. As the foundation of the forecasting model, the Power Load Monitoring (PLM) data takes on great importance due to its completeness, reliability and accuracy. However, monitoring equipment failures, transmission channel congestion and anomalies result in missing PLM data, which directly affects the performance of the PLF model. To address this issue, this paper proposes an L 1 -and-L 2 -Regularized Nonnegative Tensor Factorization (LNTF) model to impute PLM missing data. Its main idea is threefold: (1) combining L 1 and L 2 norms to achieve effective feature extraction and improve the model ’ s robustness; (2) incorporating two temporal-dependent linear biases to describe the ﬂ uctuations of PLM data; (3) adding nonnegative constraints to precisely de ﬁ ne the nonnegativity of PLM data. Extensive empirical studies on two publicly real-world PLM datasets with 1,569,491 and 413,357 known entries and missing rates of 93.35% and 96.75% demonstrate that the proposed LNTF improves 14.04%, 59.31%, and 71.43% on average over the state-of-the-art imputation models in terms of imputation error, convergence rounds, and time cos, respectively. Its high computational ef ﬁ ciency and low imputation error make practical sense for PLM data imputation.


Introduction
As one of the most commonly utilized technologies in the electric power industry, the Power Load Forecasting (PLF) (Chakhchoukh et al., 2010;Borges et al., 2012;Zhang et al., 2022;Yang Y. et al., 2023;Wang et al., 2024) is critical to the operation and planning for the power systems, which helps power companies make reasonable decisions to improve the stability and efficiency of the electric grid (Hafeez et al., 2020;Hammad et al., 2020;Li et al., 2021;Duan et al., 2023;Shirkhani et al., 2023).On the other hand, the Power Load Monitoring (PLM) (Tabatabaei et al., 2016;Abubakar et al., 2017;Wang et al., 2023) observes and records various parameters in the power systems, thereby effectively monitoring the power load situation (Zhang et al., 2023;Shao et al., 2023;D'Incecco et al., 2019;Song et al., 2022), and it provides an extensive data foundation for PLF models.However, there are missing PLM data due to sensor failures, network failures, and transmission delays (Wang et al., 2021).In addition, since PLM data needs to be collected frequently, its data scale is large.Therefore, the PLM data is High Dimensional and Incomplete (HDI) (Hu et al., 2020;Luo et al., 2021a;Wu and Luo, 2021;Wu et al., 2023a;Wu et al., 2023b).
Although the existing imputation models are able to complete the imputation of missing PLM data, but their computational and time overheads are expensive and they do not consider the simultaneous missing of multiple monitoring parameters.In contrast, the PLM data collects various monitoring parameters such as electric current, electric voltage, and electric power.On the other hand, temporal variations significantly affect model performance.Since PLM data have temporal characteristics, such as the difference between daytime and nighttime loads, different weekday and weekend usage patterns, and seasonal variations in summer and winter, the static model cannot accurately predict the load variations.Therefore, it is essential to implement a temporal variation-incorporated imputation model that can be applied to multiple monitoring parameters with efficient computation and fast response on HDI PLM data (Wu et al., 2019;Chen J. et al., 2021;Hu et al., 2021;Li Z. et al., 2022;Wu et al., 2022a;Wu et al., 2022b).
By previous studies (Nie et al., 2016;Luo et al., 2019a;Wu et al., 2020;Wu et al., 2021a;Chen K. et al., 2021;Luo et al., 2021b;Zhao et al., 2021;Qin et al., 2023), Tensor Factorization (TF)-based imputation methods are able to efficiently impute missing data by constructing the data as a higher-order tensor.Specifically, based on the Canonical Polyadic Decomposition (CPD) (Wu et al., 2021b), the TF-based model decomposes the higher-order tensor into an outer product of multiple Factor Matrices (FM) to provide a concise and unique representation (Bulat et al., 2020).For instance, Wu et al. (2021a) proposed a Proportional-Integral-Derivative (PID) TF-based model to impute missing data, which adopts the PID control principle to construct the tuning instance error, and optimizes the model parameters by the Stochastic Gradient Descent (SGD) algorithm to achieve a lower imputation error.Wang et al. (2019) incorporated a momentum term into the solving process of SGD based on CPD to accelerate the convergence process.Moreover, the TF-based models can naturally incorporate nonnegative constraints to deal with the data with nonnegativity, called Nonnegative Tensor Factorization (NTF) models (Qin et al., 2022;Xu et al., 2023;Chen et al., 2024).Luo et al. (2022) proposed an NTF model incorporating the neural dynamics principle, which constructs the CPD process as a neural network layer and introduces nonlinearity through the activation functions, thereby achieving efficient imputation of missing data.Che et al. (2023) proposed a sparse and graph regularization TF-based model for fake news detection, which employs a monotonically non-increasing algorithm to solve the model efficiently and obtain excellent model performance.
Therefore, this paper proposes an L 1 -and-L 2 -regularized-based Nonnegative Tensor-factorization (LNTF) model to impute missing PLM data.It employs a tensor to model PLM data with temporal features.Specifically, during 1 day, we arrange the sampling frequency by time and it preserves the temporal pattern of the monitored parameters.In addition, the cyclical structure of the data is further preserved by arranging the monitoring parameters of each day by time.As a result, the temporal dynamics of the data is fully reflected in the constructed three-order tensor.It further incorporates temporal-dependent linear biases to address the fluctuations of PLM data.Finally, it combines L 1 and L 2 regularizations to improve the generalization ability and robustness of the model, thereby achieve an efficient model with low imputation error and fast convergence.The main contributions of this paper are presented as follows: 1) A three-order tensor construction.It fully preserves the temporal pattern of PML data.2) An LNTF model.It provides highly accurate imputation for missing PLM data.3) An effective LNTF learning scheme.ItschI guarantees the nonnegativity of PML data.
Experimental results on two publicly available datasets indicate that the LNTF model is superior to the existing imputation models for PLM data in terms of imputation error, convergence rounds, and time cost.
Section 2 presents the related works, Section 3 gives the Preliminaries, and Section 4 introduces the proposed LNTF model in detail.Section 5 shows the comparative experiments.The conclusions and future work are provided in Section 6.

Related works
For data imputation, common methods such as Lagrange interpolation (Allik and Annuk, 2017) and the K-nearest neighbors method (Miao et al., 2016) are often employed.However, these methods perform poorly in scenarios with high missing rates.Liu et al. (2012) defined tensor nuclear norm as the combination of matrix nuclear norms obtained by unfolding the tensor along its modes.Yang and Nagarajaiah (2016) utilized prior knowledge of data structure to minimize the nuclear norm and complete low-rank matrices.Gao et al. (2017) designed a subspace merging method to represent real-world datasets by merging multiple subspaces into a larger one for missing data imputation.However, converting a third-order tensor into matrices loses the original multi-dimensional structural information.Kilmer and Martin (2011) proposed the tensor singular value decomposition method, which demonstrates the multi-dimensional structure by constructing a group ring along the tensor fibers, effectively capturing the inherent low-rank structure of the tensor.He et al. (2019) proposed a Kalman filter model that derives a recursive analytical solution and uses the least squares method to estimate the missing data.To date, researchers have proposed various data imputation models for missing PLM data.Alamoodi et al. (2021) proposed a deep learning-based imputation model for incomplete data, which captures the spatial-temporal relationships of the data through deep learning models and completes the imputation of missing data.Amritkar and Kumar (1995) constructed an encoderdecoder model to impute missing data employing long-short-term memory networks and graph convolution principles.Yang T. et al. (2023) proposed a missing data imputation model based on an improved generative adversarial network, which adopts a deep network model to extract the spatial-temporal correlation of data to achieve better imputation performance.Zhang et al. (2020) proposed a relational graph neural network with a hierarchical attention mechanism, which effectively exploits neighborhood information to highlight the importance of different entities, thereby completing accurately the missing data imputation.Luo et al. (2023) proposed a graph neural network based on attention relation paths, and introduced a neural process based on normalized flow to improve the performance of imputation.Nevertheless, these models generating the entire tensor and calculating the singular values of matrices or tensors, leading to low computational efficiency and excessively high memory requirements.

Symbol appointment
The symbol system adopted in this paper is presented in Table 1.

Problem formulation
Figure 1 presents the process of modeling PLM data to an HDI tensor.Specifically, a set of monitoring parameter data first forms a parameter vector.Then a frequency-parameter matrix is constituted by multiple samples per day.Finally, multiple frequency-parameter matrices are stacked along the temporal dimension to construct a third-order tensor.Based on previous research (Luo et al., 2017;Luo et al., 2021c), this paper follows the principle of CPD to implement TF process as shown in Figure 2.
The subsets of Λ linked with entities I, J, and An HDI tensor from PLM data.
Frontiers in Energy Research frontiersin.org03 Luo et al. 10.3389/fenrg.2024.1420449 With ( 1), an HDI tensor can be decomposed to R rankone tensors.Thus, by unfolding the rank-one tensor, each element x ijk in X is expressed by (Eq.2): ( 2 To obtain the desired three FMs, we depend on the Euclidean distance (Wang et al., 2005;Shang et al., 2021) to construct an objective function ε, it's given as (Eq.3): Note that numerous entries in Y are unknown, and we define ε on known entries Λ to accurately characterize the difference between the original element y ijk and the reformulated element x ijk as (Eq.4) (Luo et al., 2015;Luo et al., 2019b): Normally, the PLM data is nonnegative and it is necessary to add nonnegative constraints for FMs to correctly describe the data's     nonnegativity.Therefore, the objective function of the NTF is formulated as (Eq.5): (5)

LNTF model 4.1 Objective function
For the HDI PLM tensor, the monitoring parameter data fluctuate with sampling frequency and time.According to previous research (Li et al., 2016;Yuan et al., 2018), it is essential to incorporate the linear bias (LB) into the model to improve the performance when analyzing fluctuating data.Note that the constructed HDI PLM tensor has time-varying characteristics in I-dimension and K-dimension, we incorporate the corresponding two temporal-dependent LBs into the reformulated element x ijk as (Eq.6): To deal with the unbalanced distribution of known entries and to avoid overfitting the model, it is necessary to incorporate regularization terms in the objective function (Luo et al., 2019c;Liu et al., 2023).Typically, L 1 -regularized makes the FMs sparse during the optimization process, which is effective for selecting features since it automatically removes insignificant features.However, incorporating L 1 -regularized results in a non-smooth objective function (Grasmair et al., 2011;Wu D. et al., 2021).On the other hand, L 2 -regularized tends to optimize the FMs uniformly by narrowing the feature values and improving the robustness of the model, but it is sensitive towards outliers (Zeng et al., 2017;Jin et al., 2019;Benjamin and Yang, 2021).Hence, we assemble the L 1regularized and L 2 -regularized to combine their advantages, while adopting an approximation to deal with the absolute value in L 1regularized to make the objective function smooth.Further, the objective function is formulated as (Eq.7): where λ and λ b are the regularized constants for FMs and LBs, respectively, and τ is a tiny positive number.

Learning scheme
Considering the nonnegativity of PLM data, we implement the NMU for FMs and LBs based on previous studies (Badeau where η ij and η i denote the learning rates for d ir and p i , respectively.Note that we only present the update rules for one FM and one LB, since similar inferences are applied to {t jr , l kr } and q k .Further, we separate the positive and negative components in learning rule as: x ijk t jr l kr + λ d ir + d ir In order to cancel the negative component in ( 8), the learning rates are adjusted as follows: With ( 9), (10), we obtain the multiplicative update rules for FMs and LBs as:   11) With (11), if D, T, L, p, and q are initially nonnegative, they maintain their nonnegativity during the update process.

Algorithm design and analysis
Based on the above inferences, we design the LNTF model as shown in Algorithm 1.Note that the pseudocodes of procedures {Procedure_D, Procedure_T, Procedure_L} and {Procedure_p, Procedure_q} in LNTF are similar, and we provide the update procedure for U and p as presented in Procedure_D and Procedure_p.
With Algorithm 1, the computational complexity of the LNTF model depends on the complexity of initializing and updating the FMs and LBs.Specifically, the computational complexity of procedure Update_D is presented as (Eq.12): Similarly, the computational complexity of procedure Update_ p is given as (Eq.13): Therefore, the computational complexity of LNTF model is: Note that in ( 14), we omit the constant coefficients and lower order terms since |Λ| ≫ max{|I|, |J|, |K|} for an HDI tensor.With ( 14), n and R are positive constants in practical application, thus the computational complexity of the LNTF model is linear with the number of known entities in the HDI tensor.
For the storage complexity of the LNTF model, which depends on the number of known entities, the caches of FMs and LBs, the auxiliary matrices and arrays in the update procedures.Therefore, the storage complexity of the LNTF model is: ) .
( 15) With (15), constant constants and lower order terms are omitted, and the storage complexity of the LNTF model is linearly related to the number of known entities in the HDI tensor.
5 Empirical studies

General settings
In this section, all the comparison experiments are run on a desktop computer equipped with a 2.10 GHz Intel Core i7-13700 and a NVIDIA GeForce RTX 3050 GPU card.The versions of Java and Python used are SE-8U212 and Python 3.6.5,respectively.

Datasets
In this paper, two public PLM datasets iawe (Batra et al., 2013) and ukdale (Kelly and Knottenbelt, 2015) are used to compare model performance.They record load monitoring data of multiple appliances in the users' homes, and the details are shown in Table 2.In D1, the monitoring parameters 13 parameters such as electric current, electric voltage, and electric power, etc., which are sampled 86,400 times per day for 21 days, and the number of known entries is 1,569,491.Note that we further split the datasets proportionally into disjoint training (Φ), validation (Ψ), and test sets (Ω) to evaluate the model performance, as shown in Table 3.In addition, we generate 20 random splits on each dataset to achieve unbiased model results.

Evaluation metrics
The prediction accuracy in a given HDI PLM tensor directly reflects the prediction performance of the model.Based on previous studies (Li W. et al., 2022;Yuan et al., 2022;Bi et al., 2023;Wu et al., 2023c;Qin and Luo, 2023), Root mean squared error (RMSE) and mean absolute error (MAE) are commonly adopted as evaluation metrics for prediction accuracy, and they are given as (Eq.16): where Ω denotes the testing set.Note that lower RMSE and MAE denote that the model provides higher prediction accuracy for the target tensor.

Compared models
The five state-of-the-art comparison models are given as follows: • LNTF (M1): The model presented in this paper.
• TCA (M2): A multi-dimensional tensor complementation model based on CP decomposition (Su et al., 2021), which trains the latent feature matrix by applying the least squares and gradient descent methods.• HDOP (M3): A multilinear algebraic model (Wang et al., 2016), which adopts tensor decomposition and reconstruction optimization for data complementation.Frontiers in Energy Research frontiersin.org • BCLFT (M6): A latent factor TF-based model (Dong et al., 2024) that incorporates transfer learning and builds auxiliary tensors to transfer knowledge from the auxiliary domain to the target domain to alleviate data sparsity.

Training settings
For the choice of rank, larger rank provides more model parameters to capture more complex patterns and structures.However, it also leads to higher computational and time overhead, and we empirically set rank to five for all models to provide fair experimental results.In addition, the hyperparameters λ and λ b significantly affect the performance of the model.The large regularization constants help to reduce the risk of overfitting, thereby increasing the robustness and generalization ability of the model.Relatively, small regularization constants lead to overfitting.Therefore, we search for the optimal hyper-parameters by grid search.Specifically, hyper-parameters λ and λ b perform a grid search in the range [10 −3 ,10 −2 ] with a step size of 10 −3 .For the comparison models, we refer to the reference (Luo et al., 2021c) to set their hyper-parameters as shown in Table 4.Note that the training process of the model is terminated if the iteration count reaches 10 3 or if the error between two consecutive training iterations is smaller than 10 −5 .

Comparison results
This section presents a comparison of the LNTF model against the five comparison models in terms of prediction error, iteration count, and time cost.Tables 5-7 present the statistics data of all the models, respectively.Figures 3-5 chart the comparison of RMSE and MAE for all models on the dataset.From these results, we find that: • The LNTF model has lower prediction error compared with the comparison models.As shown in Table 5 and Figure 3

Conclusion
In order to achieve the interpolation of PLM missing data, this paper proposes a LNTF model with low imputation error.In the LNTF model, we design two temporal-dependent linear biases to address the fluctuations of monitoring parameters.Further, we combine the advantages of L 1 and L 2 regularizations to accurately extract features of PLM data and improve the robustness of the LNTF model.In addition, the model parameters are updated by employing the NUM algorithm to accurately describe the nonnegativity of the PLM data.Finally, experiments on two datasets with different missing rates indicate that the LNTF model can effectively impute the PLM missing data, thereby providing a reliable data foundation for following data analysis.However, the following issues still need to be addressed: • The manual tuning method for two hyper-parameters in LNTF consumes a lot of time.Therefore, it is a desirable work to incorporate the hyper-parameter adaptive learning scheme.• Can we adopt other learning schemes such as ADMM algorithm (Hu et al., 2018;Luo et al., 2021d), second-order Newton's method to update the model parameters?
We are aiming to address the above issues in future work.
Definition 1. (Three-order HDI frequency-parameter-day tensor): Given a parameter-frequency-day tensor Y |I|×|J|×|K| , where each element y ijk denotes the ith (i∈I) parameter information in the jth (j∈J) sampling point of kth (k∈K) day.Λ and Γ denote the known and unknown data, respectively.Y is an HDI tensor if |Λ|≪|Γ|.
Definition 2. (Rank-one tensor): Given a rank-one tensor H |I|×|J|×|K| r , it can be written as the outer product of three factor vectors H r = d r°tr°lr , where d r , t r , and l r are the r-th rows of the factor matrices D |I|×|R| , T |J|×|R| , and L |K|×|R| , respectively.

FIGURE 2
FIGURE 2Latent factorization of an HDI tensor Y.
• CTF (M4): A data completion model employing CP decomposition (Ye et al., 2021), which adopts the Cauchy loss to measure difference between the predicted and true values and to improve the robustness of the model.• LightGCN-AdjNorm (M5): A graph embedding based model (Zhao et al., 2022) that realizes accurate data complementation by controlling the normalization strength and aggregation process.It computes the average loss by splitting the PLM tensor into a set of slices along the time dimension

TABLE 1
Adopted symbols and their descriptions.
FFrobenius norm of the enclosed matrix or tensor•Outer product of two vectors Λ, Γ Known and unknown data in an HDI tensor

TABLE 3
Details settings of training datasets.

TABLE 4
Hyper-parameter settings of all model on datasets.

TABLE 5
Prediction error of all models.

TABLE 7
Time cost of all models (seconds).
By contrast, the iteration counts of the LNTF model in RMSE are 37.20% ofM2, 11.42% of M3, 27.58% of M4, 24.61% of M6 and  45.23%, 7.6%, 18.44%, 25% ofthem in MAE, respectively.Note that M5 is a model based on graph neural networks, which does not involve the iteration count.When the training entries are increased to 20% of the known entries, the iteration count of the LNTF model in RMSE is 20, which is the comparison mode's 74.07%, 32.78%, 55.55%, and 64.51%, respectively.For MAE, LNTF's iteration count is 78.57%, 38.59%, 59.45%, and 62.85% of comparison models.Similar results can be encountered on other datasets.• The LNTF model has fewer time cost compared with the comparison models.As depicted in Figure 5 and Table 7, the LNTF model costs 2.489s to achieve the lowest RMSE on D2.2.Compared to the comparison models of 2.876s, 16.846s, 3.493s, 84s, and 7.024s, the time cost of the LNTF model is 86.54% of M2, 14.77% of M3, 71.25% of M4, 2.69% of M5, and 35.43% of M6.For MAE on D2.2, the LNTF model costs 2.815s to reach the lowest MAE, which is 94.52% of M2's 2.978s, 17.79% of M3's 15.821s, 76.41% of M4's 3.684s, 3.80% of M5's 74s, and 35.48% of M6's 7.932s.Note that the lower time cost of the LNTF model can also be observed on D1.1, D1.2, and D2.1.