An Improved Non-Schedulable Load Forecasting Strategy for Enhancing the Performance of the Energy Management in a Nearly Zero Energy Building

This paper proposes an improved non-schedulable load forecasting (NLF) technique that can be utilized to enhance the performance of the energy management system (EMS) in a nearly zero energy building (nZEB). The suggested NLF is based on a long short-term memory (LSTM) framework in conjunction with a semi-supervised clustering (SSC) technique, considering the most important features which may affect the energy consumption of the non-schedulable appliances (NAs), i.e. number and identity of residents, energy consumption, weather, temperature, humidity, outdoor irradiance, correlation with other loads, day of the week and holidays. The SSC algorithm is utilized to fill the uncompleted information for the residents’ presence in the house and its output constitutes one of the inputs of the LSTM based technique which provides as output a set of forecasting sequences of the NAs’ energy consumption. Unlike the published techniques, the proposed NLF method is not only based on the modeling of the residents’ preferences and habits, but it considers them as variables which affect the nZEB’s microgrid and EMS performance. Therefore, it predicts the residents’ behavior considering its interdependence with the nZEB’s microgrid, which can considerably contribute to the enhancement of the EMS effectiveness and performance. For the implementation of the proposed NLF, no additional hardware is required, but only amendments in the EMS to consider the NLF’s outcomes. To validate the effectiveness of the proposed NLF, selective Hardware-in-the-Loop results from a real nZEB are presented.


Index Terms
Buildings are responsible for the one-third of the global energy consumption [1]. For this reason, specific measures to increase the energy saving in buildings have been adopted by several authorities, in order to reduce the greenhouse emissions and develop a sustainable, competitive, secure and decarbonized energy system [2], such as the directives 2002/2018 [3] and 844/2018 [4] by the European Union. It is well known that the key role for the improvement of the efficiency and performance of a nZEB belongs to the EMS, since it is responsible for the proper scheduling of the appliances and the maximum exploitation of the energy generated by renewable energy sources (RES). This can be attained by considering as many as possible variables that may affect the nZEB performance, as well as the residents' preferences and habits. Therefore, it is apparent that the effectiveness of the EMS can be considerably increased if a forecasting technique is utilized to predict both the nZEB loads' energy consumption and the RES's energy production [5]. This is required by several decision making processes that may be embedded in an EMS, such as price forecasting, energy demand scheduling, and maximum exploitation of the RES [6]. The above can be smart meter data analytics that are based on big data and artificial intelligence techniques [7]. Also, since the nZEB is a low scale microgrid, EMS improved performance may be attained by adopting microgrid based techniques [8].
The load forecasting methodologies can be generally categorized into long-term (more than a year), mid-term (from a month to a year), short-term (from one day to a month) and very short-term (less than 24 hours) forecasting methods [9]. Longterm load forecasts can be used for the energy management of power systems and their financial planning [10], while the midterm load forecasts can be used in the operational planning of the power systems, such as the maintenance scheduling and the control of a hydrothermal system of electric energy generation [11]. The short-term and very-short term forecasting can be used for the improvement of the performance of smart grids and also, as real-time forecasting techniques in order to enhance the prediction accuracy of the mid and long-term estimations [12].
Several computation methods have been proposed in the technical literature to express the non-linear characteristics of the load forecasting. The most popular techniques aim to estimate the load curve by time series analysis with autoregressive moving average models (ARMA) [13] and support vector machines (SVM) based methods [14]. An improved technique for load forecasting through a combination of ARMA based method with non-Gaussian process has been proposed in [15]. Also, a grid search approach to automatically tune the model parameters of SVM and improve the load forecasting has been presented in [16].
The evolution of the artificial intelligence technology have provided effective tools that can improve the accuracy of the building's loads forecasting such as, physical simulation models [17], [18], statistical analysis [19] and regression models [20]. Also, modern artificial intelligence techniques, such as extreme learning machines [21], artificial neural networks [22] and deep learning neural networks [23], [24], [25] have been adopted in the area of short-term load forecasting. A combination of evolutionary algorithm and artificial neural network to regulate the neural network parameters for improving the learning capability and the accuracy of the short-term load forecasting, has been presented in [26]. Another hybrid technique has been presented in [27], where convolution and LSTM neural networks were combined for short-term forecasting of building's loads energy consumption. Finally, the combination of convolution and LSTM techniques was also adopted in [28], where the inputs of the forecasting model comprise temperature and humidity datasets, classified by seasons. The most recent load forecasting algorithms for residential buildings that are published in the technical literature are mainly focused on the building's total energy consumption curve forecasting. Specifically, a novel hybrid deep learning model for predicting the energy consumption in smart building was proposed in [29] and a hybrid CNN-LSTM model for shortterm individual household load forecasting was presented in [30]. Total energy consumption forecasting technique for residential buildings were also proposed in [31]- [33]. However, in these techniques, the interdependence of the residents with the EMS, and the load shifting were ignored at the forecasting process.
The building's appliances can be categorized in programmable (PAs), controllable (CAs) and non-schedulable (NAs) appliances [34]. The PAs are electric devices that their operating time can be scheduled by the residents (e.g. electric cooker, washing machine, dish washer, etc.) and the CAs are electric loads that their operation is controlled by one or more variables (e.g. heat-pump, air-conditioner, etc. where the variables are the indoor temperature and the fan speed). The NAs are appliances that their operating time cannot be planned, and their activation/deactivation either depends on the residents' preferences without allowing to be imposed time constraints (e.g. personal computers, TV, etc.) or it is automatically controlled (e.g. lights regulated by a movement control system, refrigerator, water cooler, etc.).
The EMS can increase the energy saving and reduce the electricity expenditure in a nZEB, by properly regulating the operating time intervals of the PAs and the control variables of the CAs, considering several operating factors, such as real-time electricity price, consumed electric energy by each device, user preferences, energy generated by RES, state-of-charge and energy price of the battery storage system (BSS), weather forecast, nZEB's construction characteristics, etc. [35]. Therefore, the prediction of the energy consumption of the NAs is quite important information for the EMS of an nZEB to achieve reduction of the energy consumption, proper utilization of the energy generated by building's RES and effective exploitation of the BSS. In other words, the NAs' forecasting is a matter that does not solely refer to loads energy consumption, but it is a key factor for the EMS effectiveness and performance.
Due to the stochastic nature of the NAs, their load curve is usually neglected in the EMS for the real-time scheduling of residential appliances, as in [36]. In some other studies, the NAs' load curve is experimentally determined after measurements of the energy consumption for a specific time period and then, its forecasting is obtained by curve fitting, as in [37] and [38]. However, these techniques do not consider the factors that affect the NAs' operation and consequently, they cannot effectively predict their dynamic performance that could result to the accurate NAs' forecasting.
The research methods that have been proposed to predict the NAs' load more accurately are mainly based on the modeling of the residents' behavior and habits. Specifically, an attempt to forecast the NAs' energy consumption based on the resident behavior learning has been presented in [39] and by utilizing a single LSTM neural network for different household loads has been proposed in [40]. The clustering method to group residents with similar consumption behaviors has been adopted in [41] and [42]. In particular, the forecasting method of [41] was based on the load curves profiles using a dynamic time warping technique to cluster the load curve and seek a canonical shape for each set of curves, while the [42] is focused to leverage the comprehensive dataset available from a local smart grid demonstration project and explores the drivers behind the derived groups of electricity customers. Finally, a similar clustering based method with a neural network that employs weather and calendar features, and considers customer behavior similarities has been presented in [43].
From the above it is resulted that the prediction of the NAs energy consumption has been examined in the published technical literature only from the aspect of the residents' preferences and habits, whereas the interaction with the EMS performance in respect to the time scheduling of the PAs and the control of the CAs has been ignored. Specifically, the published research efforts have been mainly focused on the modeling of the residents' behavior for forecasting the total NAs load curve without considering the correlation of the NAs with the PAs and CAs operation, which is an important issue for an EMS. Therefore, an integrated and easily applicable method for NAs' forecasting is required that can address the aforementioned problem from the point of views of both energy load conditions and interaction with the EMS, so as the effectiveness and performance of the EMS can be enhanced.
The scope and the contributions of this research are the following:  A multi-objective NLF strategy is proposed, that is not only based on the modeling of the residents' preferences and habits, as in published literature, but it considers them as variables which influence the nZEB's microgrid operation and EMS performance. Specifically, the proposed NLF strategy can accurately predict the NAs' energy consumption by properly utilizing the most important features which affect their operation.  An SSC k-means algorithm is employed to lighten the burden of the residents to constantly inform the system for their presence in the house. The output of SSC k-means algorithm is one of the inputs of the LSTM neural network framework, which acts as the main forecasting technique of the NAs' energy consumption curve for a specific forecasting period.  The outcome of the proposed NLF system can be utilized by the EMS of a nZEB for enhancing its performance in the optimal scheduling of the PAs and the proper regulation of the CAs, as well as the maximum utilization of the energy generated by the RES and the optimal exploitation of the BSS capability.  For the implementation of the NLF system, no additional hardware is required, since the data of the energy meters and sensors that are used for the operation of the EMS are utilized. The only amendment is the replacement of the EMS firmware, so as, the outcome of the NLF is considered in the calculations. Therefore, the proposed NLF system can be installed in any existing or new EMS of a nZEB without affecting the residents' comfort and without increasing the installation cost.  The effectiveness of the proposed NLF method and the enhancement of the EMS performance have been validated in a pilot nZEB and several HiL results are presented to verify the reduction of the energy exchange with the grid and the maximum exploitation of the energy generated by RES. The rest of the paper is organized as follows. The overview of the proposed NLF technique is presented in Section II and the algorithms concerning the proposed forecasting techniques are described in Section III. The testing results and discussions in a HiL system are presented in the Section IV. Finally, the conclusions are summarized in Section V.

II. OVERVIEW OF THE PROPOSED NLF TECHNIQUE
The nZEB's architecture is based on an EMS that monitors and controls the operation of multiple energy sources and appliances. The control process is realized through a home area network, where the control signals are exchanged either via the internet or a local data network, while the monitoring operation is accomplished through smart meters [44]. The EMS is responsible for the proper scheduling of the PAs operating time and the regulation of the control variables of the CAs, so as, energy peak shaving and reduction of the electricity bill are attained and also, proper control of the energy generated by RES, so as, it can be effectively exploited. Thus, aim of the proposed NLF system is to predict the NAs' energy demand curve for a specific period ahead considering the EMS operation, so as, to improve its performance through the proper regulation of the PAs, CAs, RES and energy storage in batteries. Fig. 1 illustrates the schematic layout of a nZEB's microgrid and the overview of the proposed NLF with an EMS. As can be seen, the nZEB is controlled by a combined system of EMS with NLF that constitute an Integrated EMS (IEMS). The NLF 4 cons ists of two subsystems, the LSTM neural network framework which is the main prediction algorithm of the NAs load curve and the SSC k-means algorithm which provides information for the presence of the residents in the building. Specifically, the SSC is a supporting algorithm which completes the lack of information provided by the residents, so as, the LSTM has the required input dataset for satisfactory performance in the accurate prediction of the NAs electric load.
To effectively manage the operation of the NLF, the prediction and the recent past periods are divided into N and M discrete time intervals, respectively, of duration Δt. Thus, the input, output and control variables are value vectors of N and M time slots. The number and the duration of the time intervals are decided by the user. Although a high number of time intervals with short duration can increase the density and the length of the prediction, the accuracy may be reduced, since the residents' habits alter over the time and depend on the seasons and changes in their life, such as changes at their work, their families, etc. Moreover, the computation burden may be considerably increased that results to the increase of the requirements for the control unit and consequently the cost of the system. Therefore, the limitation for attaining satisfactory accuracy by the LSTM algorithm is the proper selection of the N, M, and Δt parameters by finding a correct balance between the criteria of prediction length and density, against the accuracy and the computation load. A typical set of values, as that has been adopted in this paper, is N=48 and M=96 time intervals of Δt=30min duration for each interval, and thus, the time periods for the prediction and recent past are 1 and 2 days, respectively.
The LSTM is a neural network framework that predicts the NA's energy consumption curve for a specific prediction length period. It receives multiple inputs such as the current NAs' energy consumption data and the PAs' and CAs' activation/deactivation status. The above datasets are provided by the energy meters and sensors that are already utilized for the operation of the EMS of the nZEB. Specifically, the current NAs' energy consumption and the current PAs' and CAs' operating status are determined by considering the measurements from the central energy meter of the total energy consumption of the building and the energy meters of each PA and CA, while the other information data are obtained by the respective sensors. Moreover, the LSTM considers the day of the week and the holiday periods, in order to properly manipulate the historical data. Also, the LSTM gets information that is provided by the SSC k-means algorithm, for the number and identity of the residents that are present in the house.
The LSTM utilizes the above information of the current moment and history to correlate the activation/deactivation status of the PAs and CAs with the usage of the NAs and then, to determine the predicted NAs' energy consumption curve for the N time intervals ahead. Thus, contrarily to the NAs' load forecasting methods published in the literature that are based on a static model of residents' preferences and habits, the LSTM is continually updated since it considers the operating features of the nZEB's microgrid with the active role that the residents may have on the above process. Specifically, the LSTM takes into account the residents' preferences and habits as variables that change over the time, since it relates the current condition with the history. This ensures that the LSTM is always updated by the potential changes of the intentions of the residents to use the various NAs and thus, accurate prediction of their energy consumption can be attained.
The role of the SSC k-means algorithm is to ensure that the LSTM has the necessary information for the residents' presence in the house, even in the case that this is not constantly provided by them. Specifically, although it is required by each resident to inform the LSTM for his presence in the house, it is quite probable that this is not done by all and constantly. Thus, the SSC k-means algorithm can fill the potential lack of information for the residents' presence and therefore, it is ensured that the LSTM has the necessary input data for its proper operation without affecting the comfort of the residents. This means that it is not imposed strict rules to the residents to provide constantly information for their presence in the house, but they can do it from time to time, just to improve the prediction outcome of the SSC algorithm. However, it should be noted that the more consistently is updated the system for the residents' presence, the more accurate is the outcome of the SSC and consequently the information that is provided to the LSTM.
Note that, the information for the residents' presence could be obtained automatically by a presence detection system through each user's mobile. However, since this may not be acceptable by all residents and also, misleading information may be provided (e.g. forgot the mobile at home, the mobile is out of order because the battery dropped, etc.), the manually provided information may be considered as the best solution.

III. ALGORITHM OF THE PROPOSED NLF SYSTEM AND SIMULATION RESULTS
The overview flowchart of the proposed NLF neural network framework is illustrated in Fig. 2. It is a hybrid deep learning technique that consists of three parts: a) the formulation and feature scaling algorithm that is responsible for the proper construction of the input datasets, b) the SSC k-means algorithm that undertakes the development of the dataset for the residents' presence at home, and c) the short-term LSTM neural network framework, that is the calculation part and provides the prediction of the NAs' energy consumption for the N time intervals. The features and details of the performance of the three parts of the NLF are described below.

A. Formulation and feature scaling of the input data
The NLF algorithm is initiated with the data preprocessing of the input variables that are historical data of the recent past M time intervals. They are operating parameters and measurement data collected by the smart sensors and the energy meters, which are already installed in the nZEB, since they are used for the operation of the EMS. They are formulated in vectors with discrete values for each Δt time interval, as defined below. The where i is the PA identification number { }  (6) and it is represented in binary mode by strings of 0/1, i.e. 0 for normal days and 1 for holidays.
An important parameter for the NFL framework is the vector of each resident's presence for the last M time steps, that is given by where r is the identification number of the residents. Specifically, 1 , PR take binary values of 0/1, e.g. 0 when the resident is not in the house, 1 when he/she is in the house, or null values in the cases that there is no information for residents' presence. The null values are filled by the SSC k-means algorithm, which will be presented in the Subsection III-B. Finally, the vectors of the rooms' temperature and humidity for the last M time steps are considered by the NFL system, since they affect the energy consumption of the CAs, and are defined respectively as 1 2 ...
Since the input data are variables of several physical characteristics, they are scaled in the range of 0/1 using the correct base value for each input variable. Specifically, the min-max normalization is adopted for the input variables E, RT rm , RH rm , OT, OH, and SI, while the one-hot encoding is applied for the PA i , CA j , T, D, H, and PR r .

B. Determining the residents' presence in the house utilizing the SSC k-means algorithm
This algorithm aims to alleviate the pressure on the residents to inform constantly and on time their presence in the house. Thus, they can do this later or they may overlook it for several times, since the SSC k-means algorithm can fill the empties and provide to the LSTM a complete sequence of the residents' presence. The SSC k-means algorithm is a semi-supervised machine learning technique, as per [45], and a separate SSC kmeans algorithm is activated for each resident of the building. , is defined as a Y-dimensional vector of the input vectors E, PA i , CA j , T, D, H, RT rm , RH rm , OT, OH, and SI for the exact m time step. Thus, the following formula that gives the Euclidian variable X 1 for the 1 st time step is The SSC k-means algorithm partitions the data set into multiple clusters denoted by , where each cluster is linked with the dataset which denotes that the resident r is in the house or the dataset which denotes that the resident r is not in the house. For each resident r and data point X m , a set of indicator variables { } , 0,1 r k m R ∈ is used to describe the assignment of the data point to the specific cluster, that are formulated in vectors , 1 For example, if the X m is assigned to cluster 2, then 2, r m R takes the value 1. Also, for each resident r, a set of Y dimensional vectors r k μ is introduced to represent the center of the k cluster. According to the confirmed residents' labeled data and the EMS labeled data, the cluster's data point dp k that is closest to the respective cluster's center is linked with the dataset which denotes that the resident r is in the house, or the dataset which denotes that the resident r is not in the house. Thus, the PR r for each resident of (7) is filled with the appropriate binary values. The number of the clusters l is proportional to the number of the total labeled data, which is the sum of residents' and EMS labeled data.
The goal of the SSC k-means clustering algorithm is to find, for each r resident, the exact values for the , The proposed SSC k-means algorithm is described in a pseudocode form in Algorithm 1. As aforementioned, the SSC algorithm is utilized to fill the uncompleted information for the residents' presence in the house. Thus, if the data provided by the residents are complete and reliable, the SSC is bypassed, and they are directly provided to the LSTM. It should be noted that the information provided by the residents, whenever it is done, improves the accuracy of the SSC algorithm through the updated input data vectors.
The data that the residents provide to the SSC k-means algorithm to inform the proposed algorithm for their presence in the house are defined as residents' labeled data. However, since the proposed algorithm may be integrated in the EMS of a nZEB, some of the PAs or/and CAs may indirectly provide information for the presence of the residents. For example, the oven, the vacuum cleaner, and the dishwasher usually are not activated without at least one resident's presence in the house. Thus, the activation of one of these appliances can be considered as a signal of the resident's presence in the house. These data are defined as EMS based labeled data. Moreover, since the proposed NLF algorithm considers the EMS based labeled data, the required residents' labeled data may be further decreased. Therefore, by properly considering the information that the PAs, CAs and EMS provides, satisfactory accuracy in the outcome of the SSC k-means algorithm can be attained with reduced requirement by the residents to constantly inform for their presence in the house.
The accuracy of the SSC k-means algorithm in completing the potential lack of information for the residents' presence in the house with respect to the percentage of the confirmed resident's labeled data, is validated in a case study that is illustrated in Fig. 3. To evaluate the importance of the PAs and CAs as inputs of the proposed algorithm, it is compared the accuracy of the algorithm for the cases that the inputs of the PAs, CAs and EMS labeled data are considered or not. In the examined case study, the parameter M is equal to 96 and an electric oven, a washing machine and a dishwasher are considered as PAs, while the heat pump and the air conditioner are considered as CAs. The model of the examined nZEB can be seen in Table I. As can be seen in Fig. 3, the accuracy of the algorithm is increased as the percentage of the confirmed resident's labeled data is increased, as it may expected. However, since there is high correlation between the operation of the PAs and CAs with the resident's presence in the house, the same accuracy of the SSC k-means algorithm can be attained with reduced confirmed resident's data when the PAs and CAs activations/deactivations are provided as inputs. Specifically, accuracy of 93% is accomplished for confirmed resident's data of 14%, for the case that the PAs and CAs are not provided as inputs (blue curve), while the same accuracy is attained with confirmed resident's data of 6%, if the PAs and CAs are given as inputs (dotted red curve). Moreover, further reduction in the confirmed resident's data can be achieved if, in addition to the PAs and CAs as inputs, the labeled EMS data are given as inputs to the SSC k-means algorithm. As can be observed, the same accuracy as above is achieved with confirmed resident's data of only 2.8%, if both the PAs-CAs and EMS labeled data are given as inputs to the algorithm (black curve). Therefore, not only less residents' data are required in the latter case, but also, their comfort is less affected, or on the other hand, higher accuracy is attained with the same percentage of confirmed resident's labeled data.
It should be noted that a minimum percentage of the confirmed resident's labeled data is required, so as the accuracy of the outcome of the SSC k-means algorithm is satisfactory. This depends on the resident's lifestyle pattern variations and a typical value may range between 2.5% to 3.5% for a mean resident.

C. Forecasting the NAs' energy consumption utilizing the LSTM neural network framework algorithm
The LSTM is an advanced recurrent neural network form that replaces the original low-cell neurons, with cells consisting of more complex internal structures. An important characteristic of LSTM is that the output at the time step m-1 becomes input for the current time step m. This is very useful for the NAs' forecasting problem since the prediction is based on input data from previous time steps.
The formulations that describe the mathematical structure of the LSTM, as can be seen in Fig. 4, are the following ( ) The notations f m , i m and o m represent the forget, input and output gates, respectively, the g m is the input node, the b f , b i , b g , and b o are the forget, input, input node and output gate biases, respectively, the h m-1 is the intermediate output, the s m , s m-1 and g m describe the current, last time frame and updated current status for the cell, respectively, and the w fx , w gx , w ix , w ih , w fx , w fh , w ox and w oh are weight matrices for the corresponding inputs of the network activation functions. The symbol  denotes the element-wise multiplication and the σ and φ signify the sigmoid activation and the tanh functions, respectively.
The operation of the proposed LSTM neural network based forecasting framework is described in Fig. 5. As can be seen, input is the matrix X of the featured scaled data that comprises the subset of vectors described in the Subsection III-A (i.e. E, PA i , CA i , T, D, H) and the r PR of the SSC explained in the Subsection III-B, while output is a set of forecasting sequences of the NAs' energy consumption.
Due to the sequential nature of the LSTM procedure, an arbitrary number of consecutive layers can be stacked to form a deep learning network. The outputs of the top LSTM layer feed a conventional feedforward neural network which maps the intermediate LSTM outputs to the single value of the NAs' energy consumption forecast of the target time interval. All weights and biases are updated based on the minimization process of the differences between the LSTM outputs and the actual training samples.
The performance of the proposed LSTM neural network framework algorithm is illustrated in Fig. 6, where the case study of Fig. 3 has been adopted (Table I) for the SSC k-means algorithm, by using both the PAs-CAs and EMS labeled data as inputs. Two cases are examined. In the first case, the residents provide a small amount of data for their presence at home (the percentage of total labeled data is 10%, blue curve), while in the second case, the residents provide an increased amount of data (the percentage of the total labeled data is 80%, black dotted curve). As can be seen in Fig. 6, although slightly increased accuracy is attained at the case that 80% of the total labeled data is provided by the residents compared to the counterpart case of the 10%, while the rest for both cases is estimated by the SSC k-means algorithm, the performance of the proposed NLF algorithm is satisfactory in both cases, as compared with the real NAs curve (dotted red curve).
The contribution of the SSC k-means algorithm on the accuracy of the proposed NLF system is summarized at the Table II that compares the mean absolute error (MAE) of the NAs' load forecasting in kWh for the cases of the LSTM with and without the SSC k-means algorithm. As can be observed, the MAE is 0.37kWh for the case of the LSTM without the SSC k-means algorithm; however, it is reduced to 0.31 when the SSC k-means algorithm is used.
Note that the SSC k-means algorithm can always fill the uncompleted information for the residents' presence in the house, even the case that a minimum percentage of confirmed residents' labeled data is not provided, but with the penalty of reduced accuracy. Also, the LSTM can give prediction outcome for any values are chosen, for the number and the duration of the time intervals of the prediction and the recent past periods. Therefore, the operation of the NLF method is stable and the sensitivity and consequently the accuracy of the obtained results depend on the fulfillment of the aforementioned limitations.

IV. HIL RESULTS
The model of the examined nZEB microgrid is housed in a dSPACE DS1104 PCI controller board and the IEMS (EMS with the proposed NLF) runs in a PC with an i7 8700k processor and an RTX 2070 Super as GPU. The communication between the HiL plant and the PC is conducted by digital and analog signals that command the BSS and the appliances, while the measurements are obtained by utilizing the RS232 serial interface.
The examined microgrid plant is a pilot nZEB of area 120m 2 in Polykastro, Greece, which is inhabited by two adults and a child. The RES of the nZEB consists of PVs and a WT of nominal power 2.7 kW and 2 kW, respectively (Table I). The data were collected for a period of 1-year (from 1-March-2019 to 1-March-2020) and the sampling time was Δt=30min. The BSS consists of a Li-ion battery pack of energy capacity 5 kWh. Three PAs are considered, i.e. a dishwasher, an oven, and a washing machine.
The EMS control strategy of [34] has been adopted in this paper that aims to the proper scheduling of the PAs and control of the CAs, as well as the optimal management of the energy generated by the RES by properly selecting to self-consume it within the nZEB, inject it to the grid, or temporary store it in batteries. The above tasks of the EMS are performed by considering the forecast of the NAs' load curve that is estimated by the proposed NLF system.
The data are split into three subsets. The first subset is used for the training of the forecasting models of the proposed NLF, the second subset (which is the validation subset) is used to evaluate the performance of the different models, and finally, the third subset (which is the test subset) is used for the evaluation of the results. According to the consistent finding in [46] for LSTM algorithms, multiple layers always work better than a single layer, and the number of hidden nodes should be sufficiently large. Thus, it has been decided to implement the LSTM To demonstrate the effectiveness of the NLF and demonstrate the operating improvements of the IEMS, three scenarios are illustrated, for one day that was chosen arbitrarily on a period of 1 year. For the 1-year considered period and for all the examined scenarios, the residents have directly provided to the system information for their presence in the house at an average percentage of 6% of the total cases (10% total labeled data and 4% EMS's labeled data). Specifically, the performance of the IEMS (combined EMS and NLF) on the tested nZEB is illustrated in Fig. 7 that can be compared with the ideal case where the EMS runs knowing a priory the real energy consumption of the NAs (Fig. 8) and the case that the EMS operates considering a predefined fixed curve for the NAs' energy consumption (Fig.  9). Note that the case of Fig. 8 is not realistic, but it can be used as a reference to evaluate the IEMS performance of Fig. 7. Also, Fig. 7 is compared with the case that is usually applied in the literature where the NAs' loads curve is considered constant (Fig. 9) to validate the advantages of the NLF. For all the above examined cases, the energy generated by the RES is the same, in order to have a common comparison base (1 st diagram of all the above figures, where the blue and red curves correspond to the power generated by the PVs the WT, respectively).
As can be seen, the PAs operate in different time intervals (4 th diagram of all figures), since the EMS provides different time scheduling for each examined case. The same also occurs for the power provided/absorbed by the grid (5 th diagram of all figures) and the power and the SoC of the BSS (6 th and 7 th diagrams of all figures). Thus, different energy transaction with the grid is observed for each of the examined cases that is illustrated in the 8 th diagram of all figures and it is the sum of the absolute values of the energy consumed and injected into the grid, E Grid . In the 5 th diagrams, the positive values of the power P Grid are referred to energy consumed by the grid, while the negative values correspond to energy supplied to the grid. In the 6 th diagrams, the positive values of the power P BSS are referred to energy recovered by the batteries while the negative values correspond to energy stored to the batteries.
In Fig. 7, comparing the two curves of the 3 rd diagram, it is concluded that the NLF system can successfully predict the NAs' energy consumption, since the blue curve which corresponds to the predicted NAs' energy consumption is close to the real NAs' energy load (red curve). Also, comparing the EMS performance of the Fig. 7 with that of the Fig. 8 which corresponds to the reference ideal case, it is resulted that satisfactory performance of both the energy management of the RES and BSS is accomplished, since both the total energy traded with the grid and the BSS the SoC performance of the Fig. 7 are very close to the ideal case of Fig. 8 (6 th to 7 th diagrams). The energy exchanged with the grid is slightly higher with the NFL compared to the ideal case, due to some discrepancies in NA's forecast (8 th diagrams). The advantages of the proposed NLF can be revealed by comparing the Fig. 7 (proposed NLF) with the Fig. 9 (method that is usually applied in the literature, where the NAs' loads curve is considered constant). As can be observed in the 2 nd diagram of Fig. 9, although the power load of the nZEB is low at the time interval between 10:00 to 11:30, the SoC of the BSS has reached the maximum value of 80% (7 th diagram) and consequently the only option is to provide the excess energy generated by the RES to the grid. Contrarily, the SoC of BSS in the  Proposed algorithm with SSC k-means and 80% labeled data 10 case of Fig. 7 has reached the maximum value for a much smaller period and therefore, less energy is traded with the grid. Therefore, enhanced energy management in the BSS is attained with the proposed NLF ( Fig. 7) compared with the case where a fixed NAs' load curve is considered (Fig. 9). Specifically, the energy exchanged with the grid at the end of the examined day is higher by 5kWh for the case of Fig. 9 compared with the case of the proposed NLF (Fig. 7) and 7kWh compared with the ideal case of Fig. 8 (please compare the 8 th diagrams of the above figures). Considering that the price of selling the electric energy to the grid is lower than the price of buying energy from the grid, the economic impact is lower in the case of the proposed NLF technique, compared to the case of fixed load curve. The above conclusion has also been validated for a short period of one month (Fig. 10) and a long time period of six months (Fig. 11). As can be seen in Fig. 10, the energy exchanged with the grid for the proposed NLF load forecasting technique is 501kWh, whereas it has been increased to 559kWh for the case of the NAs' fixed load curve scenario. Similar results are obtained for the longer period of six months (Fig. 11). In this case, the energy exchange with the grid of the proposed IEMS is 4,126 kWh, while it has been increased to 4,798 kWh for the case of the NAs' fixed load curve. Therefore, due to the higher buying energy price against the selling energy price to the grid, the proposed NLF system provides economic benefits for the building that corresponds to the lower energy exchange with the grid of 58kWh. Moreover, since the increase of energy exchange betwee n building and electric grid increases the energy losses, the proposed NLF technique results to the increase in the efficiency of the nZEB's microgrid. Finally, the advantage of the proposed NLF technique in reducing the energy exchange with the grid, contributes to the enhancement of the stability and reliability of electric power system. The energy exchange with the grid for the three examined case studies of very short, short and long period of time are reported in Table III. From the above it is concluded that the NLF can contribute to the improvement of the EMS with respect to both the energy transaction with the grid and the proper control of the BSS for the maximum exploitation of the energy generated by the RES with several economic and technical benefits.

V. CONCLUSIONS
In this paper, an improved NLF system for accurate forecasting of the NAs' energy consumption is proposed, that is based on a LSTM recurrent neural network framework in conjunction with a semi-supervised clustering technique. The LSTM is the main technique for the NAs' forecasting, while the SSC kmeans is an auxiliary algorithm that aims to alleviate the pressure on the residents to constantly inform the system for their presence in the house. Through the proposed SSC k-means algorithm, not only less confirmed residents' labeled data are required, but residents' comfort is less affected. Also, the proposed LSTM technique with the SSC k-means achieves lower mean absolute error compared with the case of the LSTM without SSC k-means. Since, the proposed NLF technique can be utilized by the EMS of a nZEB, the stability and the reliability of the building microgrid can be enhanced by reducing the energy exchange with the grid, as well as the proper control of the BSS, attaining maximum exploitation of the energy generated by the RES with several economic and technical benefits. For the implementation of the NLF, no extra energy meters and sensors or any additional hardware are required, but only amendments in the EMS firmware so as, the outcome of the NLF is considered in the calculations. The feasibility and effectiveness of the proposed NLF in the EMS's performance has been validated in a HiL system of a pilot nZEB and selective results are presented to demonstrate the operating improvements.
As a future work is the extension of the proposed methodology on a master/slave topology for application in block of nZEBs and large area microgrids. Specifically, a master algorithm will control the outcomes of the IEMSs that act as slave algorithms, in order to maximize the exploitation of the RES and BSS of the block of nZEBs and the microgrid. [Days] Time