From time series to image analysis: A transfer learning approach for night setback identification of district heating substations

.


Introduction
With the development of smart thermal grids framework, district heating systems play an important role in sustainable thermal energy production [1][2][3] and dominate the heating markets in Nordic countries [4].Therefore, energy efficiency of district heating systems is of high importance.
Heat loads of district heating substations are significantly impacted by the control settings.Night setback is one of the control settings that is used to lower the indoor temperature during the night, with the purpose of saving energy through reduced heat losses due to decreased difference between indoor and outdoor temperature.It is a common way to perform temporary heat load reductions [5].By applying night setback, the indoor temperature of a building is lowered at night without reducing the comfort level to save energy and thus cost for the users.Such night setback control is commonly integrated with the building management system (BMS) of commercial as well as residential buildings [6].As a result of applying night setback setting, daily load patterns generally show a dip in the evening due to the temperature control and a sudden peak in the morning that is due to the need of warming up the cooled off building to the set point temperature.However, total heat usage of the building is not decreased by applying such a setting for modern and well-insulated building [7].In addition, the co-occurrence of sudden morning peak demand leads to peaks in the district heating system that can be problematic for utility companies.Therefore, identification of night setback is of great interest to energy stakeholders.
In this study, a new approach based on image analysis and transfer learning is proposed for night setback classification of district heating substations.The proposed approach shifts the research problem from conventional time series domain into the area of image analysis by converting energy usage series into the corresponding heatmap images.Then a transfer learning approach based on different pre-trained deep neural networks is used for heatmap images classification.Two main advantages of the proposed approach are: heatmap images provide more straightforward presentations of the load pattern than the conventional time series plots; this way of problem framing enables the use of advanced deep learning and transfer learning algorithms.
Literature review is split into three sections: Section 2.1, Section 2.2 and Section 2.3, which cover the past studies of heat load pattern analysis in district heating, applications of heatmap plots in energy domains and image analysis using deep learning as well as transfer learning methods, respectively.From the literature, the following research gaps are identified.
Research gaps identified from Section 2.1: • Heat load pattern analysis concerning night setback identification of district heating is not thoroughly studied in the literature due to the lack of high-resolution smart meters installation.• Machine learning approach has not been well studied for heat load pattern analysis.• Studies that focus on night setback identification of district heating are not found in the literature.
Research gaps identified from Section 2.2: • Heatmaps (also called Carpet plots) are useful tools that have been used in different areas of energy domains.However, these heatmaps were analysed by domain experts manually.In addition, heat map plots have not been used for identifying night setback in district heating and in the energy domain in general.
Research gaps identified from Section 2.3: • Deep convolutional neural networks (DCNNs) are regarded as the state of art for computer vision tasks.Transfer learning alleviates the slow convergence and lack of training data problems that faced by conventional DCNNs approaches.However, neither DCNNs nor transfer learning has been studied in district heating.
The motivation of this study is to fill the research gap.
The main contributions of this study are: • A new way of problem formulation is proposed, which shifts the original research problem from conventional time series domain into image domain by converting hourly meter reading data into heatmap plots.Such problem formulation makes transfer learning possible.In addition, it is flexible to switch back to manual examination even if the model fails as the proposed problem formulation allows domain experts to make the judgement much easier.• The proposed image analysis based on transfer learning approach is the first attempt in this problem domain.Fundamental features such as edge, gradient and color detection learned from pre-trained models are need for almost all types of images [26].In addition, high level features of a pre-trained models such as eyeball of a cat, feather of bird, etc. don't exist in heatmap images of district heating substation energy usage.Due to this nature, only fundamental features are necessary and important, which makes transfer learning a suitable approach in this case.
To verify the effectiveness of the proposed approach, real operation data of 133 substations in Oslo are collected and used as a case study.Precision, recall, f1 score and accuracy [8] of six types of pre-trained deep neural networks: Resnet [9], Vgg [10], Alexnet [11], Squeezenet [12], MobilenetV2 [13], Densenet [14], and their variations are measured.In addition, training time and the size of each model are reported.
The rest of the paper is organized as following: Past studies of applying heatmap image analysis in energy domains are reviewed in Section 2, followed by the overview of the proposed method and the description of dataset used in the experiment in Section 3. In Section 4, experiment results are presented and discussed.Conclusions are presented in the last section.

Heat load patterns analysis in district heating
In 2013, Gadd and Werner [6] presented a study regarding analysis of heat load patterns of district heating substations.Hourly meter readings of 141 buildings in south west of Sweden are used in this study.Load patterns with four different control settings, namely, continuous operation control, night setback control, time clock operation control 5 days a week and time clock operation control 7 days a week are manually identified according to weekly heat load plots.The sample plot of weekly heat load patterns due to night setback setting used in this study is shown in Fig. 1.
From Fig. 1, each daily pattern within the week of December-February shows a morning peak at 7:00 and an even dip at 23:00 approximately.
In 2015, Gadd and Werner further presented a fault detection approach based on manual analysis of meter readings [15].There were three categories of faults identified: unsuitable heat load pattern, low average annual temperature difference and poor substation control.A set of theoretical rules were applied to identify different faults.For example, all buildings with pronounced night setback control were considered as unsuitable heat load patterns.One challenge of performing such analysis is that knowledge about the activities in the buildings is required.
In 2019, Calikus et al. [16] applied a data-driven approach based on k-shape clustering algorithm for discovering heat load patterns in district heating.The clustering method was applied to a group of buildings to identify representative patterns of heat load profiles and different control strategies.After clustering, results were examined by domain expert, which took huge manual effort.Although the proposed approach showed promising results, the domain knowledge is applied and the results obtained in this study were not leveraged to automate the process of control strategies identification and reduce future manual effort.
Some other recent studies of head load pattern analysis focus on daily load peak forecasting.Goia et al. [17] proposed a two-step procedure for short term peak load forecasting using historical hourly heating demand data of district heating systems.To be more specific, clustering was performed first to classify the daily load curves.Then, based on the clustering results, a family of functional linear regression models were used for forecasting.Results showed that the proposed method outperforms other conventional regression models in terms of mean, median absolute errors, standard deviation and maximum error.
Guelpa et al. [17] proposed a physical model to evaluate the heat demand of the district heating network in Turin aiming for morning peak shaving.This physical model was developed based on mass, momentum and energy conservation equations.In addition, another physical model was used to decide the maximum thermal anticipation value that maintains the desired comfort level inside the building.Results showed that the primary energy consumption reduction by the proposed method is approximately 0.8% at an average environmental temperature of 0 • C.However, the objective of these studies is energy conservation by reducing peaks of the daily load patterns.
From the literature of district heating, studies of load pattern analysis are scarce and a study focusing on night setback identification is not found in the literature.Methods used in earlier studies before 2015 are either simple visualizations or applications of theoretical rules, while physical modelling and data driven approaches are used in recent studies.An unsupervised learning method is used the most recent study [16], which is the only study that utilizes a machine learning approach.
F. Zhang et al.

Applications of heatmap plots in energy domains
Heatmap is a useful tool that identifies important patterns of time series in energy domains, which might be difficult to be detected by conventional time series plots [18].A selection of the literature is given here to indicate why this visualization method could also be of use for analysis of data from district heating substations, but many more can be found.
Costa et al. [19] applied heatmap plots to visualize time series data from the Environmental Research Institute building at the University College of Cork (UCC) in Ireland aiming to compare the hourly electricity and heat consumption of the building against the outdoor temperature and solar radiation.The result showed that heat was supplied mainly in early morning, while electricity consumption increased in occupied hours of weekdays as the heating system consists of a water-to-water heat pump coupled with underfloor heating.Such characteristics of the time series data were clearly revealed by heatmap plots.
Salmon et al. [20] used heatmap plot to visualize hourly net exported energy of a year.The dataset used in this study consist of single family houses in Demark, Sweden and Finland, an office building in Singapore and a multi/family house in Spain.This study concludes that heatmap plots provide useful information for building and grid designers, such as during which period of a year the building exports energy or demands energy from a grid.It can also show peak values of imported and exported energy and the period when a building is self-sufficient when a storage system is present.Similar conclusions were derived from a later study [21] by the same authors.
Raftery and Keane [22] proposed a contour heatmap plot for building performance data analysis.Three application areas of the proposed plot were demonstrated in this study.The first one was visualizing patterns of annual building performance data of an office building in Seattle.The second example was visualizing the discrepancies between the simulated and measured hourly electricity consumption that is helpful for monitoring model performance during the calibration process.The last application area was comparing the plots for recent lighting electricity consumption plotted for hour of the day and day of the week, with that for a baseline dataset without any fault.It is concluded from this study that the approach is useful for identifying schedule related issues.
Jayathissa et al. [23] proposed a simulation method to evaluate a dynamic photovoltaic shading system, combining both electricity generation and energy demand of the building.Heatmap plots were used to visualize the influence on the system performance using different PV panel angles, with the aim of selecting an optimum angle for the system to maximize PV generation and minimize the demand of heating, cooling and lighting.In addition, a heatmap plot was used to the plot net energy against hour of the day and month of the year using the optimal angle.
Litiu et al. [24] applied heatmap plots for visualizing building performance data.Datasets used in this study were collected from three office buildings in Sweden.Building specific patterns, baseline operational performance and missing measurements were easily identified from the heatmap plots used in this study.It is pointed out by the authors that heatmap plots are useful for building performance data visualization and can be potentially used for various building performance management practices, such as continuous performance optimizations, ongoing commission monitoring, fault detection and prevention.
From the literature, heatmap plots are useful for visualizing time series data in energy domains.However, the heatmap plots used in the literature were analysed by domain experts manually.In addition, heat map plots have not been used for analysing night setback of district heating.

Image analysis using deep learning and transfer learning
Computer vision has increasingly attracted researchers' attentions in recent years due to its diverse application areas.Among the methods for image analysis, deep convolutional neural networks (DCNNs) based approaches have achieved state of art results in various computer vision tasks [25].Common application areas of DCNNs are medical image classification [26,27], classifications of building construction materials [28,29], road condition classifications [30][31][32], water quality classification [33], smart surveillance [34], self-driving [35,36] and so forth.However, training a DCNN from scratch takes a lot of time and requires large amounts of annotated datasets, which is generally difficult to obtain.With the development of pre-trained models and transfer learning, the slow convergence and lack of training data problems are  alleviated.Transfer learning has been successfully applied in some problem domains in the literature, such as medical image analysis [37][38][39][40], hand gestures recognition [41], crop pest classification [42], safety guardrail detection [43] and so forth.From the literature, neither image classification using DCNNs based methods nor has transfer learning approach been studied in the energy domain in general.

Overview of CNN and transfer learning
Transfer learning refers to using and fine tuning a pre-trained model for a specific task that is different from what it was originally trained for.Although the target data is from another domain in most case, part of knowledge learned by a pre-trained model is common and can be transferred across different domains [44].
The basic idea of transfer learning is training an ANN using a large labelled dataset carefully labelled by domain experts such as ImageNet [45].Then, parameters learned in the first step are transferred to another ANN.In other words, learnable parameters of the new ANN are initialized with pre-trained weights and bias instead of training from scratch.In most cases, targets of the pre-trained model and the new model are of a different nature.For instance, a model is trained to classify images of dogs and cats in the first setting, while the target of the new model to be trained is to classify images of ants and wasps in the second setting [46].The overall process of transfer learning is shown in Fig. 2.
The building blocks of transfer learning are pre-trained models that have weights that have already been trained with other dataset.From the literature, CNN based pre-trained models have been proved to be the state of art for computer vision tasks [25].A standard CNN architecture is shown in Fig. 3.
Assume at the l th layer, the input is a |l− 1| of size (n ), where h, w and c denote the height, width and number of channels respectively.The number of filters is n |l| c , the n th filter K (n) is of size (f (l) , f (l) , The bias of the n th convolution is b |l| n .The convolution result of input and the n th filter is calculated by Eq. (1).
where f denotes the activation, x and y are the pixel location at the height and width dimensions of the input image respectively.
The final output a |l| of all n |l| c filters is calculated by Eq. ( 2).
In terms of the pooling layer, the purpose is down sampling the feature map size without changing the number of channels.The most commonly used pooling approach is max pooling.Assume pooling size is p |l| , g |l| is the pooling function.The input is a |l− 1| of size ), output of a pooling layer is a |l| of size (n ), which is calculated by Eq. (3).( conv )) , )) F. Zhang et al.
where (i, j) ∈ [1, 2, …, p |l| ] 2 , x, y are the pixel locations at the height, width dimensions of the input and z is the input channel.Six types of commonly used pre-trained DCNNs: Resnet, Vgg, Alexnet, Squeezenet, MobilenetV2 and Densenet are used in this study.The selected models are the most popular ones in the literature for image classification using transfer learning [47][48][49] and have achieved satisfying results in different domains.A short summary of each pre-trained model is shown in Table 1.

Data description
A dataset based on measurement data with hourly resolution from the primary side of district heating substations in Oslo of year 2017 is used in the case study.The dataset was made available with permission from the utility company using the developed web API and contains the following parameters: Energy (sum for each hour), Volume flow (sum for each hour), Supply temperature (instantaneous value at the end of each hour), Return temperature (instantaneous value at the end of each hour).Meter readings of the dataset range from 1st Jan 2017 to 31st Dec 2017.There are 8760 data points in total per substation.
This study invokes data collected from 133 substations.The heatmap plots for these substations are manually examined by domain experts.Of these, 25 substations have been identified by experts as exhibiting night setback characteristics and labelled as 'night setback', while the rest substations are identified as not having night setback setting throughout the year and labelled as 'non-night setback'.

The proposed method
The overall process of the proposed approach is shown in Fig. 4. Using the labelled dataset described in Section 3.2, yearly energy usage data of each substation is sliced into monthly series and heat energy values of each monthly series are normalized into the range of [0,1] by Eq. ( 4).
where x i is the i th data point in the dataset x , max(x) is the maximum value of the energy during this month, min(x) is the minimum energy during the month, min(x) is the minimum energy during the month, and x inorm denotes the i th data point after normalization.A sample of normalized monthly time series is shown in Fig. 5.
Then each time series is converted into a corresponding heatmap image, which is a straightforward procedure.The values of x axis of the heatmap image represent each day of the month, while its y axis values represent hours of the day and colours of the heatmap reflect the amount of energy usage after normalization at the corresponding hour of the day for each month.
The corresponding heatmap image of Fig. 5 is shown in Fig. 6.Samples of a substation labelled as night setback and that labelled non-night setback are shown in Figs.7 and 8, respectively.
Figs. 5-8 also reveal one advantage of the proposed approach.The heatmap images provide a clearer and easier view for domain experts to judge whether it is a night setback substation or not compared with examining the original time series plots.The heatmap of a substation with night setback shows a morning peak that starts at roughly 4:00 in the morning and an evening dip at 20:00 for several days in row within the month.The heatmap for a non-night setback substation shows a more random pattern.Such patterns are difficult to be observed by examining the original data series.Therefore, even if the model fails, it is  much easier for a domain expert to perform a manual check of heatmap images of unsure substations rather than looking into original time series plots.
Since the night set back only affects space heating load, the change in load due to night setback is less obvious or non-existent in warm months.
Therefore, heatmap images of relatively cold months: January, February, March, October, November and December are used in the experiment, while data of the other months are filtered out.
After data filtering, each heatmap image is resized into the same size of input images that are used to train the pre-trained models.Then, a conservative data partition approach is used to randomly split the entire dataset into a 40% training set and 60% data for testing in a stratified fashion.A summary of the data after filtering and partition is shown in Table 2.
Data augmentation refers to creating random variations of input data so that a new augmented image appears to be different from the original image.However, data augmentation does not change the meaning of the data and potentially improves the generalization ability of the models.Examples of commonly used data augmentation techniques for images are rotation, flipping, brightness change and so forth.Applying data augmentation in a scientific way has been proved to be useful for reducing overfitting of a deep neural network [50].However, in the context of sequential data, conventional image augmentation methods such as random flipping/rotation may damage the temporal information within the original time series.Therefore, the data augmenting strategy should be more carefully chosen in this case.
Three data augmentation techniques are applied in the proposed approach that add variations to the original data but keep the temporal characteristics: • Gaussian noise with zero mean and a standard deviation of 0.01, which has been found to be a reasonable value for time series data augmentation [51][52][53], is added to the time series data.Adding such minor noise makes input data vary but peaks and valleys of each hour are still well preserved.• A sliding window of 24 h is applied, which shifts the original time series by one day that preserve the sequential meaning of the original time series and the corresponding label will not be changed by such a shift.In addition, shifting a small amount of time steps will not cause too much data loss.• Mixup is a relevantly new data augmentation approach proposed by Zhang et al. [54], it is particularly useful when the amount of training data is small or the training data of a specific classification task is very different from the dataset used for a pre-trained model.In addition, Mixup is dataset independent, and therefore domain expert knowledge is not required when applying Mixup for data augmentation.The overall process of Mixup is first selecting an image randomly to be Mixup with the original image, then choose a random weight value.A new Mixuped image is created by the weighted  The standard two-phase training procedure of transfer learning is used.In the first phase, layers prior to the newly added classification layers are frozen.Since the pre-trained layers are already well-trained and capable of capturing general concepts, such as identifying edges and gradients of an image, freezing these layers avoids the carefully pretrained weights to be broken during the training.In the second phase, all layers are unfrozen and trained.However, the fundamental features   such as edges and gradients learned by first pre-trained layers are useful for almost all different tasks as well, while later layers learn the features that are more task specific, such as a feather of a bird, eyeball of a cat and so forth [55], which is not useful for night setback identification.Therefore, discriminative learning rate proposed by Howard and Ruder [56] is used in phase 2 training.The main idea of discriminative learning rate is that a lower learning rate is used for bottom layers of a pre-trained model and a higher learning is applied to the top layers, such that the weights of first layer will be changed less and slower than that of the later layers.Due to the time and resource constraint, a fixed number of epochs is used as the stopping criteria during the training.Meanwhile, both training and validation loss are monitored, to avoid overfitting.In addition, an optimal learning rate range is set for each model based on the approach of cyclical learning rates for training neural networks proposed by Smith [57].Hyper-parameters used in the training phases, such as the type of optimizer, the number of epochs, and learning rate range used for each model are reported in Table 3.
After training the models, out of sample testing is performed using the testing set.

Results and discussions
In addition to accuracy, other performance indicators that are useful for measuring model performance on imbalanced datasets [58] are used to evaluate the six pre-trained models: precision, recall and f1 score.The calculated performance indicators for the case study for classifying night setback using the proposed transfer learning approaches defined in Table 1 are presented in Table 4.For Resnet, Vgg and Squeezenet there is   more than one pretrained model available and so all have been used in the study.
Results show that the overall performance of all the models is reasonably good under these circumstances, being trained with an imbalanced and relatively small amount of dataset.Each model achieves an overall accuracy that is greater than 0.95 and an f1 score greater than 0.9 for night setback classification.Among all the models and their corresponding variations used in this study, the recall of Vgg11 is the highest, which means there are more positive 'night setback' cases captured by Vgg11 than that of other models.However, the precision of Vgg11 is the lowest among all models used that means of all the cases predicted as 'night setback' by Vgg11, there are fewer cases that are actually 'night setback'.The results are in line with the theory of 'precision and recall trade off' [8], while f1, which is defined as the harmonic mean of the model's precision and recall is a more useful overall performance indicator.MobilenetV2 outperforms the others in terms of the overall accuracy and the f1 score for night setback identification.Overall there are only small variations in performance indicators when comparing the models, i.e., a difference less than 0.03 in f1 score and 0.01 in overall accuracy between different models.
Although theoretically a deeper neural network is more capable of solving complex problems than a shallow neural network is, it is not always the case.One potential reason is that a deeper neural network is more likely to be overfitted than a shallower one when the amount of training data is small [59].Variations of the same type of pre-trained neural networks are attempted in the experiment.For instance, in the Vgg neural networks group, Vgg11, Vgg16 and Vgg19 are used.Vgg16 consists of 13 convolutional layers and 3 fully connected layers followed by a single Softmax layer, while Vgg19 is three layers deeper than Vgg16.However, the relative shallow Vgg16 shows both a higher overall accuracy and a better f1 score for night setback classification from the experiment result.Similarly, in the Resnet neural networks group, Resnet18 performs better than the deeper Resnet34 model both in terms of the f1 score and overall accuracy.
Visualization of which regions of an image are most import to a CNN improves the model interpretability.It gives insights of what is learned by the model and why the model makes such a prediction that can be critical to model diagnosis.To provide such visual explanations, gradient class activation map (GRAD-CAM) [60] and guided backpropagation [61] algorithms are used.For a given image, GRAD-CAM algorithm calculates the gradient of the logits of the target class of interest with respect to the activation maps from the CNN layer.Then, the importance of different regions of the given image is indicated by the gradients averaged across each feature map.On the other hand, guided backpropagation algorithm isolates the important regions of the given image by keeping only positive contributions to the gradient during backpropagation.Sample GRAD-CAM guided propagation plots of both night setback and non-night setback cases are shown in Fig. 11 -14 respectively.Y axis are added back to the test images to better illustrate at which hours peak and dips occur.
From Figs. 11 and 12, the highlighted important regions are approximately located at both morning peak and even dip, which is reasonably aligned with the domain knowledge.Namely, when experts judge if a substation shows night setback characteristics or not, higher importance is given to the peak in the morning and dip in the evening, while rest hours of a day are of less importance.On the other hand, plots generated by GRAD-CAM and guided propagation are relative random for the non-night setback cases that are shown in Figs. 13 and 14.
In addition to the performance indicators shown previously, size and training time of each model are reported in Table 5. Results show that Squeezenet are the most light-weighted models and training Squeeze-net10 is faster than the other models used in this study as well as using less memory (model size).MobilenetV2 is also small with relatively short training time, while Vgg models and Alexnet require relatively large storage comparing with other models.
Choice of the model to be used depends on the cost of false prediction versus the cost of missed out night setback cases, time efficiency requirement and the cost of storage.Among which, time efficiency is the least important factor since training a model on the fly is not required in this context.Based on the experiment results, MobilenetV2 is a good choice in terms of both its error measures and the storage requirement.
Last but not the least, according to the conclusions derived from the literature review, there is a limited number of studies regarding night setback identification of district heating and the methods used in these studies mostly reply on human inspections of the original energy usage time series.This means that there is a lack of a standard dataset for benchmarking, which makes the comparison of different studies using different methods difficult.

Conclusions and future work
In this study, a new approach based on deep neural network and transfer learning is proposed for night setback identification of district heating substations.The research problem is framed in a new way, namely, converting meter reading data into corresponding heatmap plots that shifts the problem to be solved from conventional time series domain into the domain of image analysis.By framing the research problem this way, it is possible to take the advantage of transfer learning using pre-trained models, of which there are several available.In addition, the proposed approach is flexible so that even if the model fails, it is much easier for a domain expert to perform a manual check of heatmap images of unsure substations rather than looking into conventional time series plots.A dataset consisting of hourly meter readings of 133    substations in Oslo is used in the case study, among which 25 substations are labelled as night setback by two domain experts.40% of the total dataset is used for training and 60% is used for testing.Although the training set is relatively small and imbalanced, results show that all models achieve a reasonably good performance with an overall accuracy greater than 0.95 and an f1 score greater than 0.9 for night setback classification using the proposed transfer learning approach.In addition, training time and size of each model are measured and reported.Among all the models used in the experiment, MobilenetV2 outperforms the others in terms of both f1 score for night setback classification and overall accuracy.In addition, MobilenetV2 has a reasonably small size.Last but not the least, Grad-cam and guided propagation algorithms are used to improve the model interpretability and better understand which part of the image is of high importance for the model to make the decision.
In the future, more pre-trained deep neural networks such as Efficientnet, Inception and Xception can be further explored.Moreover, a different interpretation method, such as Saliency maps can be tried.In addition, there are several potential improvement areas with respect to the data.Firstly, more substations data can be used to improve the model performance.Secondly, instead of relying on domain experts to manually examine and label the substations, checking with building managers whether the building has night setback setting enabled or not can be an alternative depending on the corresponding cost and availability.Moreover, although performance of all models is reasonably good using the small amount of imbalanced training dataset, a different data partition strategy that allows the model to use more training data and applying data balancing strategies can be potentially beneficial.Besides, min-max normalization is performed to the original time series before converting it into the corresponding heat map image, while converting the raw time series to heat map images without normalization or pre-processing the raw time series with other types of normalization approaches can be explored as well.Finally, different length of sliding window, percentage of Gaussian noise and Mixup can be explore for data augmentation and oversampling strategy can be used in future study for data balancing.

F
.Zhang et al.

Fig. 5 .
Fig. 5. Example normalized monthly energy usage for a substation labelled as night setback.

Fig. 6 .
Fig. 6.The corresponding heatmap image of the time series shown in Fig. 5, for substation labelled as night setback.

Fig. 7 .
Fig. 7. Example normalized monthly energy usage of a substation labelled as non-night setback.

Fig. 8 .
Fig. 8.The corresponding heatmap image of the time series shown in Fig. 7, for substation labelled as non-night setback.

Fig. 9 .
Fig. 9.A sample training monthly heatmap image and the corresponding image after being shifted and adding Gaussian noise.Left: Sample Image 1 before augmentation.Right: Sample Image 1 after augmentation.

Fig. 10 .
Fig. 10.Two randomly chosen sample training monthly heatmap images and the corresponding image after Mixup.Left: Sample Image 1. Middle: Sample Image 2. Mixup result of 80% sample Image 1 and 20% sample Image 2.

Fig. 11 .
Fig. 11.Sample testing monthly night setback heatmap image with morning peak at 5:00 and evening dip at 22:00 (left) and the corresponding GRAD-CAM (middle) and guided propagation (right) plots.

F.
Zhang et al.

Fig. 12 .
Fig. 12. Sample testing monthly night setback heatmap image with morning peak at 4:00 and evening dip at 20:00 (left) and the corresponding GRAD-CAM (middle) and guided propagation (right) plots.

Table 1
Summary of pre-trained models used in this study.
F.Zhang et al.

Table 2 A
summary of the data after filtering and partition.

Table 3
Hyper-parameters used in training phases.

Table 4
Summary of the testing results.

Table 5
Summary of the model size and training time.
F.Zhang et al.