Household Electricity Load Forecasting Based on Multitask Convolutional Neural Network with Profile Encoding

Household load forecasting provides great challenges as a result of high uncertainty in individual consumption of load profile. Traditional models based on machine learning tried to explore uncertainty depending on clustering, spectral analysis, and sparse coding with hand craft features. Recently, deep learning skills like recurrent neural network attempt to learn the uncertainty with one-hot encoding which is too simple and not efficient. In this paper, for the first time, we proposed a multitask deep convolutional neural network for household load forecasting. .e baseline of one branch is built on multiscale dilated convolutions for load forecasting. .e other branch based on deep convolutional autoencoder is responsible for household profile encoding. In addition, an efficient encoding strategy for household profile is designed that serves a novel feature fusion mechanism integrated into forecasting branch. Our proposed network serves an end-to-endmanner in training and inference process. Sufficient ablation studies were conducted to demonstrate effectiveness of innovations and great generalization in point and probabilistic load forecasting at household level, which provides a promising prospect in demand response.


Introduction
Smart grid is considered as an electric grid that specializes in delivering electricity in a controlled and intelligent approach from points of generation to consumers, both of which form an integral part of the smart grid when customers are able to modify their purchasing patterns and behavior according to the received information, incentives, and disincentives [1][2][3]. Attractions of smart grid depend on its capability of improving reliability performance spontaneously, encouraging customers' responsiveness and advanced efficiency decisions between customers and utility providers [3,4]. Consequently, demand side management (DSM) occupies an essential integral part of smart grid [5][6][7]. Meanwhile, the smart meter plays a crucial role in DSM that is able to achieve energy savings, exploit renewable energy resources, and encourage customers' participation in energy market depending on deep cognition to residential load profiles or behaviors [8].
Particularly, deep learning skills, for example, recurrent neural networks (RNNs) or convolutional neural networks (CNNs), have been confirmed to have superior performance [27,29,37] rather than traditional methods at load forecasting in different aggregated level due to excellent capability of extracting discriminative correlation features in sequence. Chitalia et al. [38] presented a robust short-term electrical load forecasting framework that can capture variations in building operation, regardless of building type and location using advanced deep recurrent neural networks. Deng et al. [9,29,39] devised a novel convolutional neural model with multiscale dilated kernel for short-term load or price forecasting, which provides more predominance of extracting significant features in time-series analysis. Sideratos et al. [40] gave an advanced fuzzy-based ensemble model for load forecasting using hybrid deep neural networks following a two-stage architecture including radial basis function neural network (RBFNN) and CNN. Li et al. [41] developed a convolutional long shortterm memory-based neural network with selected autoregressive features to improve short-term household electricity load forecasting accuracy by employing three strategies: autoregressive features selection, exogenous features selection, and a "default" state to avoid overfitting at times of high load volatility. Dong et al. [42] designed a deep learning approach based on K-nearest neighbors to capture uncertainty and reflect the range of electrical load fluctuation.
Dudek et al. [43] presented a hybrid and hierarchical deep learning model for midterm load forecasting. e model combines exponential smoothing (ETS), advanced long short-term memory (LSTM) and ensembling. ETS extracts dynamically the main components of each individual time series and enables the model to learn their representation. Multilayer LSTM is equipped with dilated recurrent skip connections and a spatial shortcut path from lower layers to allow the model to better capture long-term seasonal relationships and ensure more efficient training.
Although deep learning skills have achieved better performances in short-term load forecasting, proper lag selection and hyperparameters setting in deep learning are required for searching optimal training results. It is rather a hard problem for which an optimal solution cannot be found in a polynomial time. is hardness is accentuated by the complexity of electricity-consumption data patterns [44]. One effective strategy to acquire an optimal configuration in forecasting model depends on metaheuristics approaches [45][46][47][48][49][50][51][52][53][54][55][56] with excellent capability of finding near-optimal solutions in a very large space. However, there are some great challenges in specific field of household short-term load forecasting.

Reducing Uncertainty.
Generally, load forecasting concentrates on different aggregation levels, such as system, feeder, and regional ones. In some level, individuals could be inferred from one forecasting model with shared parameters as a result of their similar behaviors. For instance, load patterns of commercial buildings serve a larger granularity, where different rooms provide regular consumptions under external factors such as weather or central air conditions. Load profiles of manufactory enterprises reflect little variations with relatively stable plans of production. However, the load profiles of residences exhibit more volatilities and uncertainties because of their different lifestyles and randomness of behaviors providing great challenges in forecasting accuracy. Researches [29,37] demonstrated that original household load profile can be decomposed into regular pattern, uncertainty, and noise. Regular pattern refers to periodical load profile. Uncertainty depends on some aperiodic and external factors such as weather, family activities, and individual preferences. Noise represents the residue that cannot be physically explained [11,13,37].
Most machine learning skills are competent in learning linear relationships and exploring regular patterns effectively. In contrast, these approaches with hand craft features cannot deal with uncertainty at household level that accounts for a great proportion. Different household holds different behavior and remarkable variations in time series delivering great stochasticity and nonlinear salience. Consequently, relying on shared model based on traditional model is almost doubtful. To address these problems, three categories of methods have been presented [37]: (1) Clustering approaches [36,57,58] were designed to group households based on similar behavior at high level, which decreases uncertainty in each category and extracts more regular patterns to facilitate household load forecasting. However, how to segment subjects appropriately prohibits an acceptable result generally and this strategy is excessively sensitive to different dataset. Meanwhile, some researches [18][19][20][21] proposed aggregate load forecasting (ALF) to cancel uncertainty. ALF is actually considered as a larger granular level not specialized in household ones. (2) Some spectral analyses such as Fourier transforms [23], wavelet analysis [22], and empirical mode decomposition were introduced in order to extract the regular pattern located in load profile. is strategy is not suitable for household load forecasting since regular information occupies relatively smaller proportion.
(3) In the domain of power delivery systems, sparse coding has been applied to the problem of energy disaggregation [59,60]. Recently, sparse coding becomes preferred at household level that provides each house a profile description and an efficient approach to separate uncertainty, learning, and representing gross patterns of individual consumption [37,61]. Yu et al. [61] analyzed and decomposed the dataset into fixed patterns that constitute ultimate format of encoding, which lacks flexibility to show various uncertainties. Shi et al. [37] used one-hot encoding to increase individual features in deep RNN to extract uncertainty achieving the state-of-the-art performance. However, sparse encoding cannot handle massive users and express similarity among households effectively.
rough investigations, most of researches tried to adopt clustering, pattern separation, or sparse coding to learn uncertainties of load profile at household level. ese strategies based on machine learning offer a restricted capability of modeling with hand craft features in time series analysis. In recent years, deep learning has appeared as a powerful tool in areas such as image processing and data analysis. With deeper architecture and sophisticated operations, deep neural networks provide superior ability of learning discriminative features and nonlinear relationships, which benefits extracting uncertainty at household load forecasting [29,37].

Building Effective Network.
Deep learning skills have been applied in load forecasting, and most of them rely on RNN or long short-term memory (LSTM) [27,37,[62][63][64][65][66]66]. LSTM is derived form RNN, both of which are successful in the target of sequence to sequence learning such as speech recognition and natural language processing in time series analysis. However, when managing long-term sequence, RNNs suffer from the problem of gradient disappearance severely, even though LSTM alleviates this case partly. Specifically, latest researches [9,29] revealed that convolutional neural network (CNN) offers more advanced accuracy as a result of powerful capability of discriminative feature extraction. In addition, some mechanisms like residual connection cannot cause dramatical gradient disappearance even in deeper network. Consequently, related skills could be optimized to identify and learn both regular pattern and uncertainty in load profile at household level.
In this paper, we propose a multitask convolutional neural network with household profile encoding (MCNN-HPC). e novel encoding branch serves more effective description on household behavior especially focusing on uncertainty. In coordination with multiscale dilated convolutional neural network [9], our proposed model provided the state-of-the-art performance of VSTLF at household level. e key contributions are as follows: (i) We propose a multitask neural network that consists of two branches. e baseline of one branch is built on multiscale dilated convolutions for load forecasting. e other branch based on deep convolutional autoencoder is responsible for household profile encoding. (ii) A novel encoding strategy is designed to explore uncertainty in household behavior effectively. Compared with traditional technique in deep neural network, our proposed method has great predominance to express individual behavior feature and nonlinear correlation in time series analysis. (iii) We present a novel mechanism of feature fusion between two branches, which is also interpreted as a superior feature selection process and leads to remarkable improvement in accuracy. (iv) Our proposed network serves an end-to-end manner in training and inference process. Sufficient ablation studies were conducted to demonstrate effectiveness of innovations and great generalization in point and probabilistic load forecasting at household level. e rest of the paper is structured as follows: Section 2 defines the problem and describes the details of our proposed model. Section 3 introduces the experiment setups. Section 4 exhibits and discusses the detailed results of comparison experiments. e conclusions are drawn in Section 5.

Problem Formulation.
Our research focuses on VSTLF at household level that pays more attention to load forecasting for the nearest point (the next 30 minutes) in very short term. In practice, we only employ the historical load data (X t ) for training and inference process of our proposed neural network. In electricity market, X t is easily acquired from households via smart meters. Consequently, our task can be described to build nonlinear relationships between historical load sequences and predicted points as follows: where X t � [x 1 , x 2 , . . . , x t ] denotes the historical load sequence happening in 1, . . . , t time, and Y � [y t+1 , y t+2 , . . . , y t+n ] represents the output of prediction. t and n manifest the length of the input and output sequence. When n � 1, the prediction target becomes the single-step forecasting VSTLF. If n > 1, it belongs to a multistep forecasting task. In point load forecasting, y t becomes a scalar, while in probabilistic load forecasting y t grows to a vector with length q, denoting q quantiles estimated at t.

Backbone Network.
Our proposed deep neural network consists of two branches corresponding to different tasks, respectively, in Figure 1. Forecasting branch is responsible for household load forecasting as the baseline of MCNN-HPC. Household profile branch provides more advanced encoding information to learn uncertainty of individual behavior based on historical load profile. Feature Fusion 1 to 3 as an innovative concatenation of different level network serves more excellent feature selection process than traditional manners. Both branches are fused at the end of the network with a fully connected layer providing an end-to-end manner for training and inference of load forecasting at household level.

Forecasting Branch.
In forecasting branch, the baseline of the network includes multiple convolutional blocks with different dilated ratio kernels, which is able to extract Mathematical Problems in Engineering multiscale features reflecting various nonlinear relationships in sequence. is strategy has been demonstrated an advanced optimization applied in CNNs for load forecasting [9]. In practice, we set the input sequence a 48-dimensional vector and each point denotes half an hour. Consequently, the input vector refers to the load sequence of 24 hours before the predicted time. Forecasting branch consists of 8 convolutional blocks with dilated rates 1, 2, 4, and 8 convolutional kernels, respectively, and each block produces 8 × 48 × 1 feature maps. In order to avoid gradient disappearance and enhance quality of training, forecasting branch increases lots of residual connections between blocks, illustrated in Figure 1.

Household Profile Branch.
Profile encoding branch is responsible for generating personalized code to learn and reflect uncertainty in daily life of each household. e input of household profile branch comes from a deep convolutional autoencoder (DCAE), illustrated in Figure 2. Depending on deeper networks and convolutions, DCAE serves more excellent capability of squeezing input sequence into latent-space representation that superiorly expresses inherent features and nonlinear relationships in time-series analysis [67,68]. Our designed DCAE holds a symmetrical encoder-decoder structure with three convolutional blocks on both sides. 336-dimensional vector is devised as original and reconstructed input, where each point represents the load that has happened in every half hour on average 52 weeks in one year. rough maxpooling and upsampling operations, the output of middle layer (yellow color) is the specific encoding result for each household profile with 42 dimensions, which also offers the input of household profile branch in Figure 1. In practice, we use DCAE to generate 42dimensional household feature vector for individuals based on historical load data reflecting discriminative uncertainty in behavior prominently. In addition, we design fully connected layers that constitute household profile branch. After two shared layers, three kinds of fully connected layers with different number of activation neurons are linked to forecasting branch for feature fusion.

Feature Fusion.
Feature selection is an essential process where features are automatically or manually selected and contribute most to prediction. In time series analysis, models based on machine learning try to present advanced supervised or unsupervised algorithms to explore more significant features to acquire potential nonlinear relationships. For example, Cai et al. [69] proposed a direct multistep model based on gated convolutional neural network (GCNN) for multistep load forecasting. GCNN module imports gated mechanism to select salient features in CNN achieving the state-of-theart performance. However, this model also suffers from the problem of LSTM with limited feature expression and gradient disappearance.
In this paper, we propose a novel feature selection process shown in Figure 1, where outputs of household profile branch as learnable weights are fused into baseline of forecasting branch by multiplication operation. ree outputs of household profile branch are set to 1 × 48 × 1, 8 × 48 × 1, and 1 × 8 × 1, respectively. As shown in Figure 1, in operations of feature Fusion 1 and 2, compared with traditional concatenation element-wise multiplication fusion makes sure of more effective feature selection process.
e vector from household profile branch is filtered by Sigmoid activation and values are located within from 0 to 1. Moreover, in feature Fusion 3, as our proposed model focuses on single-step forecasting, we use the vector that consists of last points (red color) in each channel of feature maps to join in feature selection, which depends on an important assumption that the load that happened in the last half an hour has the closest relationship with the forecasting point. Relying on sophisticated studying in an end-to-end manner, the welltrained household profile branch provides proper weights for individuals to extract remarkable features, respectively, in order to understand their regular and uncertain pattern with the shared model. erefore, the entire MCNN-HPC is able to explore more nonlinear relationships in consumption behavior of each house achieving more competent performance with great generalization for load forecasting at household level. e full anonymized dataset is publicly available online and comprises three parts: (1) half-hourly sampled electricity consumption (kWh) from each participant; (2) questionnaires and corresponding answers from surveys; (3) customer type, tariff, and stimulus description, which specifies customer types, allocation of tariff scheme, and demand side management (DSM) stimuli [37]. In detail, there were 929 residential customers who did not join any demand program and enjoyed controlled stimulus and tariff. In other words, their consumption can realistically reflect behaviors filled with regular pattern and uncertainty.

Software and Hardware Platform.
All experiments were conducted on a cloud server with two NVIDIA P4 computing cards and the CPU with 8 cores. Deep neural models were implemented by the Keras framework with TensorFlow backend [70].

Program Implementation.
Our proposed model consists of two tasks: forecasting and household profile branches. At the beginning, an individual vector of historical load profile is encoded via a well-designed DCAE, and the output 42-dimensional feature vector is then delivered to household profile branch as input. With effective feature fusion, both branches are integrated significantly with an end-to-end manner giving contributions to load forecasting at household level. e implementation process is divided into three stages: (1) data preprocessing; (2) household profile encoding; (3) forecasting. Details are described in Figure 3. We trained the proposed model for each customer with shared parameters for households. During the training, we used the learning rate decay and early stopping strategies based on the variation of validation loss to reduce computation cost and prevent overfitting.

Benchmarks and Setup.
For the data preprocessing, as a result of noise interferences, we removed some redundant data and filled the missing ones by linear interpolation. For training process, the raw data from Irish dataset is manipulated into input through two branches, where household load profiles are captured from smart meters half hourly. At forecasting branch, for each household, the input sequence uses 24-hour load data before the forecasting time, a 48-dimensional vector.
Consequently, there are nearly 25,000 datasets which are divided into training set, validation set, and test set 80%, 10%, and 10%, respectively. 336-dimensional vector of load profile in one year is encoded into a 42-dimensional one based on DCAE, and then the feature vector is delivered to household profile branch as input for training and inference.
For the principle of fairness, in ablation studies, we kept the same configuration of compared neural models. e experiment setups and hyperparameters of convolutional neural networks are presented in Table 1.

Results and Discussion
where y t is the forecast value and y t is the actual outcome value at time t. e mean absolute percentage error (MAPE) is one of the most widely used measures of forecast accuracy, due to its advantages of scale-independency and interpretability. However, MAPE has the significant disadvantage that it produces infinite or undefined values for zero or close-to-zero actual values. In order to address this issue in MAPE, MAAPE calculates the mean arctangent percentage error between the forecast and the eventual outcomes. MAAPE inherently preserves the philosophy of MAPE, overcoming the problem of division by zero by using bounded influences for outliers in a fundamental manner through considering the ratio as an angle instead of a slope [71]. For probabilistic forecasting evaluation, there are three commonly used attributes: reliability, sharpness, and resolution. Reliability refers to how close the predicted distribution is to the ground truth. Sharpness means how tightly the predicted distribution covers the actual curve. Resolution signifies how much the predicted interval varies over time. Measures like Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling statistics assess the unconditional coverage of a probabilistic forecasting rather than its sharpness or resolution. In this paper, the performance of the probabilistic forecasting is evaluated by the average pinball score, which is a comprehensive measure metric considering not only reliability but sharpness and resolution. e quantile scores have the same equation with quantile loss, and pinball score is defined as follows: where y t is the truth at time t, y t,q denotes the forecast of quantile q at time t, Q refers to the defined number of quantiles, and T test represents the number of samples in the test set. In addition, in order to make a proper evaluation on candidates, the prediction interval (PI) should be assessed. e Winkler score is another comprehensive measure that allows a joint assessment of the unconditional coverage and interval width. A central PI of time t with 100(1 − α)% confidence level is given as [L t , U t ], where L t and U t are the lower and the upper boundaries of the PI.
where the interval width δ t is calculated by δ t � U t − L t . In this paper, we evaluate the PI coverage of 80% for Winkler score. Lower pinball score and Winkler score indicate a better performance.

Evaluation of Multitask Network.
In order to evaluate the effectiveness of our proposed model with multitask, ablation studies were conducted on the contribution of household profile branch. In detail, we randomly selected 10 households to construct training set, validation set, and testing set, each of which contains nearly 25,000 48-dimensional sets. Table 2 illustrates results of this ablation study, which gives comparison performances on our proposed forecasting branch with (our proposed) or without (original) household profile branch demonstrating the effectiveness of multitask. Specifically, experiments were completed on 24 : 00, 20 : 00, 18 : 00, 16 : 00, 14 : 00, 12 : 00, 10 : 00, 8 : 00, and 4 : 00, respectively. As discussed in research [29], at different times the load profile reveals completely distinct scale, for example, in the evening higher consumption and slight ones in morning or after midnight, which is necessary to be considered in study independently. Table 2 explicitly manifests that our proposed neural network with household profile branch has remarkable predominance from 12 : 00 to 24 : 00 on MAAPE and RMSE reflecting its great ability of learning uncertainties in different household. However, from 4 : 00 to 10 : 00 the performances of both models were close, and even sometimes the original one performed better. e main reason includes that the actual load in this span is relatively smaller and a little fluctuation in prediction could cause obvious errors in metrics. Moreover, lower load profile cannot be predicted easily, yet these errors produce minute influence on load forecasting. Between 16 : 00 and 24 : 00 load consumption behaviors of individuals are most active and filled with great uncertainty, which requires pressing demands on household load forecasting in DR significantly.
In addition, we increased additional households as training set to evaluate MCNN-HPC on the same 10 selected  Mathematical Problems in Engineering households as testing set, which exploits the influence of growing dataset on the proposed model. As shown in Table 3, columns 20, 30, 50, and 100 refer to expanded training set and related percentages inspect the improvements of MAAPE and RMSE on 10 testing households. Table 3 gives results that present performance changes with the growth of training set. When 20 houses were introduced, the model served a significant improvement and 30 houses enhanced this trend, which demonstrates that expanded dataset benefits discriminative learning uncertainty of deep convolutional neural network. When the dataset was enlarged to 50 or 100 households, the performance of our model kept little improvement and became degraded gradually. It owes to the scale of tested convolutional neural network with limited trainable parameters. If blocks were increased leading to deeper network, more households could strengthen the capability of MCNN-HPC to explore regular pattern and uncertainty in individual behavior for load forecasting at household level.

Evaluation of Household Profile Branch.
To verify the superiority of household profile branch, we compared our proposed network with the state-of-the-art model [37] using LSTMs to achieve predominant performance of load forecasting at household level. is model identifies different household with one-hot encoding that is concatenated by traditional feature fusion approach into load profile as additional channel of input sequence, which prompts to extract regular pattern and uncertainty. It should be noted that they adopt the method of concatenating representation vectors of different branches, which is quite different from ours. In this experiment, we randomly selected 20 houses that joined in this study. For a more detailed comparison, we chose 11 different times to evaluate both methods increasing 6 : 00 and 22 : 00. Figure 4 shows compared average performances between our proposed model and the state-of-the-art model with one-hot encoding, where MCNN-HPC had a remarkable predominance on MAAPE and RMSE. Specifically, over time when the load stays at low level, our proposed model outperformed one-hot encoding indicating more powerful capability of detecting discriminative nonlinear relationships in complicated cases. is ablation study proves more advanced mechanism of household profile branch with advanced feature fusion process in load forecasting at household level.
In addition, we evaluated the effectiveness of feature fusion and paid more attention to structure of fusion in network. Four cases, including only Feature Fusion 1 mode, Feature Fusion 2 mode, Feature Fusion 3 mode, and our proposed strategy, were compared. Results are illustrated in Figure 1.
e last one we evaluated integrates household profile encoding into each block by learnable weights, called full fusion. We randomly selected 30 households and benchmarked five cases compared to the network without household profile branch. Results are shown in Table 4, where Fusion 1, Fusion 2, and Fusion 3 provide relatively poor performances on MAAPE and RMSE. Full fusion strategy serves similar performances with our proposed model, and even at some time it performed better. However, full fusion model caused great computation cost as a result of learnable parameters explosion with increasing blocks and fully connected layers. erefore, we preferred our proposed method of an alleviated approach to preserve the balance between effectiveness and efficiency.

Evaluation of Generalization of Our Proposed Model.
To evaluate the generalization of our proposed model, we independently trained one model without household profile branch for each household. We adopted the 10 households selected in Section 3.2 and acquired well-trained 10 individual models, respectively. en, the same 10 households were used to train and test our proposed MCNN-HPC for comparison with 10 models on average performances. Table 5 shows the results and demonstrates that our proposed model outperformed individual models on overall performances of MAAPE and RMSE reflecting the great generalization of MCNN-HPC. Meanwhile, the experiments gave more promising prospect for application in electricity market.

Evaluation of Our Proposed Model on Probabilistic Load
Forecasting. Probabilistic load forecasting plays a crucial role in DR that can provide more significant information for consumer behavior analysis. In this section, 10 households selected in Section 3.2 were divided into training, validation, and testing sets to evaluate the performance of our proposed model on probabilistic load forecasting. We conducted an ablation study to compare MCNN-HPC with and without household profile branch to verify the effective encoding strategy in this area. Table 6 gives the results where our proposed items refer to the improvement ratio optimized by MCNN-HPC. It is found that our proposed model has a superior performance on Pinscore and Winkler80 at different time indicating the positive role of household profile branch on probabilistic load forecasting at household level.
In the same way, we tested 10 well-trained individual models for 10 households, respectively, the average performances of which were then compared to evaluate the generalization of our proposed model on probabilistic load forecasting. Table 7 shows the experimental results on      Pinscore and Winkler80. In most times, MCNN-HPC served better accuracy over individual models on average, which provides great generalization of our proposed model in probabilistic load forecasting at household level.

Conclusion
is paper for the first time proposes a multitask deep neural network for load forecasting at household level. One of two branches is built on multiscale dilated convolutions for forecasting. e other branch that includes a deep convolutional autoencoder is responsible for extracting specific behavior of different household, which serves a novel mechanism of feature fusion between two branches, interpreted as a superior feature selection process leading to remarkable improvement in accuracy. We made sufficient ablation studies to verify performances of MCNN-HPC. All findings demonstrated the state-of-the-art achievement including the advancement of multitask design, the effectiveness of household profile encoding, and great generalization of our proposed model, especially in point and probabilistic load forecasting. In other words, MCNN-HPC is more competent in exploring regular pattern and uncertainty in time-series analysis. is paper focuses on providing attempting and learnings for deep learning skills for household load forecasting. Future works include designing more efficient household encoding strategy based on attention network. Moreover, more significant features like holiday or weather would prompt more advanced achievement of deep neural network remarkably.

Data Availability
e data used to support the findings of this study have been deposited in the Ireland CER repository; website is https:// www.ucd.ie/issda/data/commissionforenergyregulationcer/

Conflicts of Interest
e authors declare no conflicts of interest.