Occupancy Heat Gain Detection and Prediction Using Deep Learning Approach for Reducing Building Energy Demand

The use of fixed or scheduled setpoints combined with varying occupancy patterns in buildings could lead to spaces being over or under-conditioned, which may lead to significant waste in energy consumption. The present study aims to develop a vision-based deep learning method for real-time occupancy activity detection and recognition. The method enables predicting and generating real-time heat gain data, which can inform building energy management systems and heating, ventilation, and air-conditioning (HVAC) controls. A faster region-based convolutional neural network was developed, trained and deployed to an artificial intelligence-powered camera. For the initial analysis, an experimental test was performed within a selected case study building's office space. Average detection accuracy of 92.2% was achieved for all activities. Using building energy simulation, the case study building was simulated with both ‘static’ and deep learning influenced profiles to assess the potential energy savings that can be achieved. The work has shown that the proposed approach can better estimate the occupancy internal heat gains for optimising the operations of building HVAC systems.


INTRODUCTION AND LITERATURE REVIEW
The built environment sector accounts for a significant proportion of global energy use and energy-related emissions [1].It is responsible for up to 35% of the total final energy consumption and is increasing -fast [2].Reducing buildings' energy consumption is crucial towards meeting the global carbon emission reduction targets and will require innovative methods.Major energy consumers in buildings include the heating, ventilation and airconditioning (HVAC), hot water, lighting and appliances.While HVAC systems and their associated operations are responsible for up to 40% of the total consumption [3].This is even higher in areas with harsh or extreme climates.Enhancing the efficiency or minimising the consumption of such systems will go a long way towards developing the low carbon economy and future.Solutions such as occupancy-based controls can achieve significant energy savings by eliminating unnecessary energy usage.A significant element affecting the usage of these energy consumers is the occupants' behaviour [4].For instance, rooms in offices or lecture theatres are not fully utilised or occupied during the day, and in some cases, some rooms are routinely unoccupied.Current standards and guidelines such as the ASHRAE 90.1 [5] and ASHRAE 55 [6] suggest a generalised set point range and schedule for room heating and cooling during occupied and unoccupied hours.For example, during occupied hours, it suggests 22 -27°C for cooling and 17 -22°C for heating, while during unoccupied hours, it suggests 27 -30°C for cooling and 14 -17°C for heating.However, according to Papadopoulos [7], these HVAC setpoint configurations must be revised when applied to commercial buildings.The use of fixed or scheduled set points combined with varying occupancy patterns could lead to rooms frequently being over or under-conditioned.This may lead to significant waste in energy consumption [8] which can also impact thermal comfort and satisfaction [9].Delzendeh et al. [10] also suggested that the impact of occupancy behaviour has been overlooked in current building energy performance analysis tools.This is due to the challenges in modelling the complex and dynamic nature of occupant's patterns, influenced by various internal and external, individual and contextual factors.Peng et al. [11] collected occupancy data from various offices and commercial buildings and have identified that occupancy patterns vary between different office types.Multi-person office spaces regularly achieve occupancy rates of over 90%.However, private, single-person offices rarely achieve an occupancy rate of over 60%.While equipment or appliances in offices can be kept in operations during the entire working day, irrespective of occupancy patterns [12].The study by Chen et al. [13] highlighted that occupancy behaviour is a major contributing factor to discrepancies between the simulated and actual building performance.In current building energy simulation (BES) programs, the occupancy information inputs are also static and lack diversity, contributing to discrepancies between the predicted and actual building energy performance.
This indicates the need to develop solutions such as demand-driven controls that adapt to occupancy patterns in real-time and optimise HVAC operations while also providing comfortable conditions [14].These systems take advantage of occupancy information to reduce energy consumption by optimising the scheduling of the HVAC and other building systems such as passive ventilation [15] and lighting [16].Energy can be saved using demand-driven solutions by (1) adjusting the setpoints to reduce the temperature difference between the outdoor and air-conditioned indoor space and (2) reducing the operation time of the systems.
The integration of occupancy information into building HVAC operations can lead to energy savings [17].The occupancy detection and monitoring approach proposed by Erickson and Cerpa [18] employed a sensor network of cameras within underutilised areas of a building and have shown to provide an average 20.0%annual energy savings and 26.5% savings during the winter months.The study by Shih [19] highlighted that offline strategies for pre-defined control parameters cannot handle all variations of building configurations, particularly the large numbers of humans and their various behaviors.
Information on real-time occupancy patterns is central to the effective development and implementation of a demand-driven control strategy for HVAC [20].Several sensors and technologies [21] can be used to measure and monitor real-time occupancy.Nagy et al. [22] presented the use of motion sensors to monitor occupancy activity throughout the day.Various types of environmental sensors have been employed in buildings for automation and controls, temperature and ventilation control, fire detection, and building security systems [23].Wearable-based technologies have been increasingly popular for human detection and activity analysis in the indoor environment [24].Furthermore, Wi-Fi enabled internet of things (IoT) devices are increasingly being used for occupancy detection [25].To some extent, these sensorbased solutions provide accurate detection of occupancy patterns.Previous works, including [20,25], have shown these strategies' capabilities in sensing occupancy information through the count and location of occupants in spaces and aid demand-driven control systems.However, there is limited research on sensing the occupants' actual activities, which can affect the indoor environment conditions [26,27].The activities of occupants can affect the internal heat gains (sensible and latent heat) in spaces directly [26] and indirectly towards other types of internal heat gains [27].The real-time and accurate predictions of the occupants' heat emitted with various activity levels can be used to estimate better the actual heating or cooling requirements of a space.A potential solution is to use artificial intelligence (AI) based techniques such as computer vision and deep learning to detect and recognise occupants' activities [28].

Literature Gap and Novelty
Several works [29, 30] have already implemented vision-based deep learning methods to identify human activities and have shown to be capable of learning features from new sensor data and predicting the associated movement.Most of these studies attempted to improve the performance and accuracy of the deep learning model for human presence and detection activity classification rather than using the data to seek solutions to minimise unnecessary energy loads associated with buildings.Furthermore, no work has attempted to predict the associated sensible and latent heat emission from the occupants, which affects the temperature and humidity levels in an internal space.Furthermore, limited studies conducted tests of visionbased deep learning methods in an actual office environment and assessed its performance in energy savings and indoor environment quality.Finally, the heat emission profiles generated can also be used as input for building energy simulation (BES) tools, increasing the reliability of results since unpredictability of occupant behaviour is one of the parameters that create difficulties for BES.

Aims and Objectives
The present work aims to address the research gaps by using a vision-based deep learning method that enables the real-time detection and recognition of multiple occupants' activities within office building spaces.A faster region-based convolutional neural network (Faster R-CNN) was used to enable training of a classification model which was deployed to a camera for detecting occupancy activities.This method can identify multiple occupants within an indoor space and the activities performed by each.Validation of the developed deep learning model is conducted by using a set of testing data, and the accuracy and suitability for live detection were also evaluated.Experiments are carried out within a case study office room to test the proposed approach's capabilities and accuracy.Using BES, the case study building was simulated with both 'static' and deep learning influenced profiles (DLIP) to assess the potential energy savings that can be achieved.

METHOD
The following section presents an overview of the research method with the corresponding details for each stage of the proposed framework to develop a vision-based method for detecting and recognising occupancy activities.

Overview of Research Method
Figure 1 presents an overview of the research method.It consists of three main sections.Section 1 (highlighted in green) is the formation and application of a deep learning model for occupancy activity detection and recognition.The model based on a convolutional neural network (CNN) was trained, validated and deployed to an AI-powered camera.Section 2 is the formation of the deep learning influenced profiles (DLIP) using the live occupancy detection within the office space.The DLIP can be fed into a building energy management system and controls of the building heating, ventilation and air-conditioning (HVAC) system to make adjustments based on the actual building conditions while minimising unnecessary loads.However, for the initial analysis (yellow boxes), the DLIP profiles were inputted into building energy simulation to identify potential reductions in building energy consumption and changes within the indoor environment (Section 3).Further details of the steps described in Figure 1 are discussed in the next sub-sections.Compared with other shallow learning methods, deep learning techniques can lead to better performance in detecting and recognising objects.Many studies [31,32] showed that deep learning models with a convolutional neural network (CNN) based architecture could perform computer vision tasks with high accuracy.Convolutional Neural Network (CNN) is a deep learning network class that is extensively used for image-based classification and recognition applications.Compared with other machine learning-based classification techniques, CNN requires input data in the form of videos or images and can directly feed the data in its original form into the framework model.Instead of performing complex pre-processing stages, the data can be used to derive directly and extract the acquired features from the selected parts of an image [33].Therefore, CNN algorithm is selected in this study.
In general, the CNN architecture consists of a feedforward network with the input data such as an image is processed through the network.The feature of the data from input images is first extracted within the convolutional layers, and then the spatial volume of the input data is reduced in the pooling layer.The fully connected (FC) layer is then used to classify images between different categories by training.A fully connected layer involves weights, biases, and neurons.The output layer then delivers the outcome of the calculations and extractions.For these layers, the configuration is presented in the form of groups, indicated as stacked modules to present the structure of a deep learning model.The rectified linear unit (ReLU) layer consists of advantages due to its simple function and sparse features, which can minimise training duration.Furthermore, the SoftMax layer provides further constraint to aid the training of the model.Both the ReLU and softmax layers are essential to building CNN architectures for various applications.This includes vision-based applications such as object detection [34] and face recognition [35] and also data analysis and other programmatic marketing solutions [36].
As detailed in [37, 38], the convolutional layers are the first layer to exact features from the input data.It plays a central role in the architecture by utilising techniques to convolve the input data (image).This performs the stages of learning the feature representations while extracting without manual work.Neurons located within each of the convolutional layers are arranged into feature maps.This enables convolution to preserve the relationship between pixels by learning image features using small squares of input data through a mathematical operation.It takes the image matrix and a filter or kernel and passes the result to the next layer through convolutional kernels stride over the whole image, pixel by pixel, to create 3-direction volumes (height, width and depth) of the feature maps.
Then, the ReLU layer introduces nonlinearity into the output neuron.It is an activation function defined as a piecewise linear function that is used to enable direct output when the input was positive or otherwise as a zero output when a negative input is received.According to LeCun, [39], ReLU has become a default activation function for many types of neural networks because a model that uses it is easier to train and often achieves better performance.Through this, the volume size will not be affected while the nonlinear properties of the decision function will be enhanced during this process, resulting in an enrichment of the expressions of an image.Subsequently, the pooling layers enables the reduction in the spatial dimensions of the data (width, height) of the feature maps when the images are too large.For this, the most common spatial pooling type of Max Pooling was selected as it outperforms on processing image datasets [40].It effectively selects the largest element within each receptive field from left to right, so the output's spatial size is reduced.
Since several convolutional and pooling layers are formed in stacks to enable greater amounts of feature extraction, the fully connected (FC) layers follow on from these layers and interpret the feature representations and perform the function of high-level reasoning to flatten the matrix into a vector form.Combining the features together, the FC layers connect every neuron from one layer to every neuron in another layer.This forms the model, and along with the activation function of SoftMax, it enables the classification of the input images, which generates the classified output results of one of the following occupancy activities.
The Since this approach is designed to be useful for wider applications to solve other problems related to occupant detection within buildings [44], the deep learning model (Figure 2) was developed and tested following the steps given in Figure 3 to provide a vision-based solution.Part 1 consists of the process of data collection and model training.Images of various types of occupancy activities are collected and processed through manual labelling of the images.Through the analysis of various types of deep learning models, the most suitable type of convolutional neural network-based deep learning model was selected.This was configured specifically for this type of detection approach to provide the model outlined in  The number of images within the datasets followed the rule of thumb and suggestion given by Ng [45].Table 1 presents the number of images used within the initial development and the images categories based on the selected activity responses.Further development of the method will be carried out in future works by building larger datasets with greater responses and predictions.
All images obtained were pre-processed to the desired format before enabling the data to become ready for model training.The images were manually labelled using the software LabelImg [46].This is an open-source graphical image annotation tool which allows images to be labelled with bounding boxes to specifically identify the regions of best interest.For some cases, multiple numbers of labels were assigned to each image as this was highly dependent on each image.Hence, the number of labels given in Table 1 was greater than the number of images used.Figure 4 shows an example of the images located within the training and testing datasets of various occupancy activities and how the bounding boxes were assigned around the specific region of interest for each image.To train the convolutional neural network model, the general process requires defining the network architecture layers and training options.Through the influence of existing research which utilised the CNN TensorFlow Object detection API, a transfer learning approach was incorporated into the model configuration.Transfer learning is a learning method that leverages the knowledge learned from a source task to improve learning in a related but different target task [56].This approach enables the development of an accurate occupancy detection model within a reduced network training time and requiring fewer amounts of input data, but still provides adequate results with high detection and recognition rates.For this occupancy detection model, the network architecture layers were not defined from scratch.Instead, the TensorFlow detection model zoo [57] provided a collection of detection models pre-trained on various large-scale detection-based datasets specifically designed for a wide range of machine-learning research.For object detection, R-CNN [58], SSD-MobileNet [59] and YOLO [60] algorithms were most commonly used.However, if computational time and resource is the priority, SSD would be a better decision.If accuracy is not the priority but the least computational time, is required then YOLO can be employed.Furthermore, the required size of the detection object can have an impact on the performance of the algorithms.According to the study by Alganci et al. [61] which evaluated the impact of object size on the detection accuracy, YOLO achieved the lowest accuracy for any object size in comparison to SSD and R-CNN respectively.Whereas, Faster R-CNN achieved the highest accuracy.The performance achieved for the three types of algorithms widens as object sizes increases.Therefore, to avoid results being dependent on object sizes which is important when detecting occupants, the R-CNN was selected in the present work.
With the substantial benefits of leveraging pre-trained models through a versatile transfer learning prediction and feature extraction approach, an R-CNN model from the TensorFlow detection model's zoo directory Performance evaluation of the trained model is achieved by using the test images assigned from the test dataset (Table 1).A confusion matrix was used to summarise the detection results of the proposed algorithm, with true positive (TP) representing the correctly identified activity, true negative (TN) representing the correct detection of a different activity, false positive (FP), also known as predicted positive to represent the number of instances that the predicted activity was not true or another activity performed was wrongly identified as this specific activity.Furthermore, false negative (FN) represented the number of instances that the activity was predicted to be something else, but it actually wasn't.
Based on the created confusion matrix, evaluation metrics including, accuracy precision and recall, are used to evaluate the performance of the object detection algorithm.This is defined in eq. ( 1) -(3), respectively.Accuracy defines the proportion of the total number of predictions that were correct, while precision can be seen as a measure of exactness or quality.Additionally, recall is a measure of completeness or quantity.However, it is not sufficient to quantify the detection performance when precision and recall were separately used.With the consideration of a balance between precision and recall, the evaluation metric, F1 score was formed by combining these two measures and expressed as eq.( 4). (1) (2) Despite the selection of a robust data-driven algorithm, difficulties in terms of accurate identification between several occupant activities could occur.To overcome these, continuous improvement and development of the deep learning network is necessary to provide a sufficiently accurate occupancy activity detection for demand-driven controls.Another drawback of using a vision-based method is that it could interfere with privacy concerns.The present approach will address this by developing a system that only output heat emission profiles instead of actual occupancy information, which can then be inputted into a control system.Further details are given within the next sections.

Application of the Deep Learning Model
This section presents the methods required for the application of the deep learning model.It includes the details of the selected case study building and experimental setup, along with the process of live detection and recognition to form the real-time Deep Learning Influenced Profiles (DLIP).
Case Study Building and Experiment Setup.An office space located on the first floor of the Sustainable Research Building at the University Park Campus, University of Nottingham, UK (Figure 5a) was used to perform the initial live occupancy activity detection using the developed deep learning model.This case study building was also used for the initial performance analysis where the office space was modelled using BES tool IESVE [68] to further assess the potential of this framework and the impact towards building energy loads.
Figure 5c presents the floor plan of the 1 st floor of the building, with the desired office space highlighted.The selected office space consists of a floor area of 39 m 2 with internal dimensions of 9.24 m × 4.23 m and a floor to ceiling height of 2.5 m. Figure 5b presents the experimental setup with the 'detection camera' located on one side of the room to enable the detection of occupancy situated on the opposite side.The camera used to generate results for this present study was a 1080p camera with a wide 90 degree field of view.This was connected to a laptop which was operated using the trained deep learning model.The building operates between the hours of 08:00 to 18:00.This formed the selected hours to perform the experimental occupancy activity detection using the deep learning model.The building is equipped with natural ventilation (manually operated), along with a simple airconditioning system to provide an internal set point temperature maintained at 21 °C.The Nottingham, UK weather data was inputted into the building energy simulation model.Based on CIBSE Guide A [69], standard occupancy profiles with a sensible and latent heat gain of 70 W/person and 45 W/person was assigned.For the air exchanges, the infiltration rate value was set to 0.1 air changes per hour.
Live Detection and Deep Learning Influenced Profile (DLIP) Formation.Using the developed deep learning model, a typical cold period was selected to perform the live occupancy activity detection and recognition to assess the capabilities of the method.A range of activities was performed by the occupants.This includes the selected desired detection response types of walking, standing, sitting, and none for when no occupants are present.During the real-time detection, the output data for each of the detected occupants were used to form the occupancy heat emission profiles (DLIP).The profile consists of values corresponding to each detected activity and coupled with the heat emission data-based value for an average adult performing the different activities within an office space given in Table 2. Figure 6 shows an example of the process of DLIP formation for the live detection of occupancy activities within the select office space.It presents several snapshots of the recorded frame indicating the detected occupancy activity condition and the percentage of prediction accuracy.A DLIP was formed for each of the detections.This suggests a total of four DLIP would be created for this individual experiment conducted.
As indicated in Figure 5b, the selected office space was designed to accommodate eleven occupants as eleven office workstations were present.However, for the selected experimental test day, only three occupants were present for the majority of the time.This was achieved based on the number of DLIP generated.Effectively, this method not only recognises the activities performed by occupants in forming the desired DLIP but can also obtain data on the number of occupants present in the desired detection space.This could be useful for other types of applications.Further discussion of the detection and recognition of each detection A, B, C and D, along with the detection of each specific activity, is analysed within the corresponding results section.Building Energy Simulation.A building energy simulation tool was used to model the office space with the conditions given above.Building energy simulation consists of using a dynamic thermal simulation of the heat transfer processes between a modelled building and its microclimate.Heat transfer processes of conduction, convection, and radiation between each building fabric were modelled and included in the modelling of air exchange and heat gains within and around the building's selected thermal space.The equations are fully detailed in our previous work [70,71].The DLIP building occupancy profile was compared with three other profiles; the actual observation profile, and two conventional fixed schedule profiles, Typical Office Profiles 1 and 2. A comparison between the results obtained from these different occupancy profiles enables the analysis of the potential impact of the DLIP profile on the building energy demand.The Actual Observation Profile was formed for the assessment of the accuracy of the DLIP.This profile represents the true occupancy activity performed during the experimental time, enabling verification of the results obtained for the DLIP.
Table 3 summarises the simulation cases and the associated occupancy and building profiles used for the simulation and analysis.The different variations in occupancy profiles were created to compare the DLIP to evaluate the impact of the use of control strategies, informed by real-time multiple occupancy activity detections, on building energy performance.Case 1 and 2, follows current building operational systems based on using static or fixed control setpoints.Typical office 1 assumes that the occupants are sitting most of the time during the selected period (sedentary activity), and Typical office 2 assumes that the occupants are walking most of the time during the selected period.For the simulation cases, maximum sensible and latent occupancy gains of 75 W and 70 W were assigned.This enables representing all activities performed within the office space, with walking being the maximum at 100%, followed by standing at 79%, sitting at 64%, napping at 50%, and none activities would present 0%.Furthermore, occupancy density of one was assigned to each of the DLIP and actual observation profiles.However, for the typical office profiles, it was acknowledged that a maximum number of occupants present within the room on the selected day would be three, so this was assigned as the maximum occupancy density for these cases.

RESULTS AND DISCUSSION
This section presents the initial model training results and the analysis of the experimental results.The section evaluates the application of the real-time occupancy activity detection using the vision-based deep learning approach and the formation of the Deep Learning Influenced Profiles for each of the detected occupants.As detailed in Figure 1, the generated DLIP was intended to inform a demand-driven HVAC control system to optimise building energy performance and conditions.However, prior to the development of such a system, an initial analysis of the feasibility of this method was carried out using BES analysis.

Deep Learning Model Training Results and Performance Evaluation
The initial deep learning model was trained using the graphics processing unit (GPU) NVIDIA GeForce GTX 1080.The training approximately took 6 hours 45 minutes for the total losses to reach the level indicated in Figure 7.These training results were obtained using TensorBoard during the training process.Using the Faster-R-CNN with InceptionV2 as the training model, the results provided training for 102,194 steps from a loss of 3.44 to a minimum of 0.01007.Observations made for this proposed approach can be used to compare the performance of different modifications applied in future works.This includes the input of more training and test data and to variations of the type of models for training.Greater amounts of images will be implemented for testing purposes as the framework is developed further.
Based on the images assigned to the test dataset (Table 1), Figure 8 presents an example of the confusion matrix.It shows that majority of the images were correctly classified, showing the suitability of the model for occupancy activity classification.Furthermore, Table 4 presents the model performance based on evaluation in terms of the different evaluation metrics.Overall, it suggests that the classification for 'none' (when the occupant is absent) achieved the highest performance and 'standing' achieved the lowest.This perhaps is due to the difficulty in recognising the occupancy body form and shape, as it may be confused with the activities of both standing and walking.Nonetheless, an average accuracy of 97.09% was achieved and an F1 Score of 0.9270.
Since this model performance evaluation is based on using still test images assigned in the given testing dataset, therefore, the following experimental detection and recognition results can provide more valuable analysis as occupants progressively move, so the detection evaluation is based on a more realistic scenario, including the background conditions, environment setting and realistic occupants behaviour and actions.

Experimental Detection and Recognition Results
Figure 9 presents example snapshots at various times of the day of the experimental test of the detection and recognition of occupants within the selected office space.Based on the set up indicated in Figure 5b, it shows the ability of the proposed approach to detect and recognise occupants.Up to four output detection bounding boxes were present during this experimental detection, and the accuracy for each detection was also presented above the output bounding boxes.As given by the snapshots in Figure 9, these bounding boxes' size and shape varied between each detection interval.It depends on the size of the detected space, the distance of the camera with the detected person, and it is also dependent on the occupant's activity.In practice, these images will not be saved within the system but real-time data (for example, 1 minute intervals) of occupancy number and activities (heat gains) in the form of numerical and text-based is outputted by the system.Figure 9. Example snapshots at various times of the day of the experimental test of the detection and recognition of occupants within an office space using the deep learning occupancy activity detection approach Figure 10 presents the overall detection performance of the proposed approach during the experimental test.The results showed that the approach provided correct detections 97.32% of the time, 1.98% of the time to achieve incorrect detections and subsequently, 0.70% of the time with no detections.It should be noted that the occupants were asked to carry out their typical office tasks.Overall, this indicates that the selected model provides accurate detections within the desired office space.Figure 11 shows the results of the detection performances for a) each of the bounding boxes within the camera detection frame and b) for each of the selected response outcome of detected activities. Figure 11a suggests an average detection accuracy of 92.20% for all activities.The highest detection accuracy (98.88%) was achieved for Detection D, and the lowest was observed for Detection A with an accuracy of 87.29%.To provide a detailed analysis of the detection performance, the detections frames from the live detection were identified as Detection A, B, C and D. The results also indicate the ability to identify specific activities which were performed by each occupant during the detection period.However, detection performance cannot be solely be based on the comparison between the results for Detections A -D as not all activities were performed by the detected occupants.Further tests are necessary to fully assess its performance.
Figure 11b presents the detection performance based on the selected activities.Individual detection accuracies for each activity includes walking with 95.83%, standing 87.02%, sitting 97.22% and none (when no occupant is present) achieved an accuracy of 88.13%.This shows the capabilities of the deep learning model to recognise the differences between the corresponding human poses for each specific activity.There is some similarity between the action of standing and walking than there is for sitting.Therefore, this suggests the reason to achieve higher accuracy for sitting as compared to standing and walking.
This section highlights the importance of achieving high accuracy for all activity detections to enable an effective detection approach for building HVAC system controls.Since the following accuracy achieved were only based on small sample size, further model training and testing should be performed to achieve higher detection accuracy for the given occupancy activities to enable further applications of multiple occupancy detection and recognition of a greater number of occupants within different types of office space environments.Figure 12a presents the number of detected occupants in the office space within the office space during the test.Figure 12b shows the number of detected and recognised occupants' activities during the test.This provides a better understanding of the occupancy patterns compared to the data shown in Figure 12a, which highlights the potential of the proposed approach.

Deep Learning Influenced Profile Results
Following the approach detailed in Figure 1, the data obtained from the live detection and recognition of the occupants were used to generate the DLIP. Figure 13 presents the formed DLIP from the experimental activity detection test results.The formation of the profile corresponds to the process indicated in Figure 6, with the activities of Detections A -D.The initial results showed that the DLIP could enable the detection of various activities and provide the identification of times when there are an increase and decrease of activities performed resulting in variation of occupancy heat gains.The DLIP were plotted against the Actual Observation Profile.This defines the 'actual' occupancy activities performed, which assess the accuracy of the DLIPs.From the comparison of the DLIP and the Actual Observation Profile, an average error of 0.04% was achieved.This indicates the DLIPs would still alternate between the different activities due to the occurrence of prediction error, which suggests the opportunities for further improvements to enhance the accuracy, reliability and stability of the detection model.14 presents two static occupancy profiles typically used in HVAC system operations and in building energy simulations to assume the occupancy patterns in building spaces.Both occupancy profiles were formed assuming that there was constant occupancy in the building spaces and fixed values for occupant internal heat gains.Typical Office 1 represents the average heat gain by a sitting person (115 W).Typical Office Profile 2 represents the average heat gain by a walking person (145 W).During the detection period, there was a 37.38% and 50.25% difference between the Typical Office Profiles 1 and 2 and the Actual Profile.Hence, a large discrepancy between the true occupancy activities performed within the building spaces and the scheduled occupancy profiles can be expected.

Building Energy Performance Analysis
The following section provides an analysis of the impact of the proposed deep learning activity detection approach on building energy consumption during a typical working day.The generated DLIPs are compared with the static scheduled profiles in Figure 14.
Figure 15 presents the building energy simulation (BES) results of the occupancy sensible and latent gains.Typical office 1 and 2 results followed the assigned static scheduled occupancy profiles (Figure 14).Based on the simulated conditions, it can be observed that the typical office profiles over predicted the occupancy heat gains within the room.
The DLIP results provided a better estimation of the occupancy internal heat gains.The occupancy heat gains were high from 09:00 -10:00 when there was an increase in activity movement in the space.Lower occupancy heat gains were observed between 13:15 -13:30 as most of the occupants had left the office space during this time.This shows the potential of the deep learning method in providing a more accurate estimation of the internal heat gains.Additionally, Figure 15b shows the predicted latent heat gains.The accurate prediction of the latent heat gains is important for the estimation of the required dehumidification load and can further reduce unnecessary energy usage.This is important for buildings located in tropical or humid climates as it can lead to heavy usage of air-conditioning systems.The method should be further evaluated by incorporating it into buildings with different climates.Based on the simulated conditions, the occupancy heat gains predicted by using the Typical Office 1 and 2 profiles suggests an overestimation by 22.9% and 54.9% as compared with the Actual Observations.This is equivalent to 83.2 kWh and 199.8 kWh.In comparison, there was a 1.13% kWh) difference between the DLIP method and Actual Observations.Figure 17 shows the heating demand of the office space during a typical cold period in the UK, comparing the simulation results of the BES model with different occupancy profiles.Figure 17a presents the heating load across time, and Figure 17b compares the total heating loads for the selected day.The predicted heating load for the model with the DLIP profile was 375.5 kW and was very similar as compared to the Actual Observation profile.While the model with Typical Office 1 and 2 profiles had a heating load of 372.0 kW and 371.8 kW.As expected, the DLIP and actual heat gains in the space were lower than static profiles, which assumed constant activities in the space, and hence the heating requirement will be higher in order to provide comfortable indoor conditions.

CONCLUSION
The study develops a deep learning vision-based activity detection and recognition approach to enable the generation of real-time data.The data can inform building energy management systems and controls of an HVAC system to make adjustments based on the actual building conditions while minimising unnecessary loads.For the real-time detection and recognition of the common occupancy activities within an office space, a faster region-based convolutional neural network (Faster R-CNN) was developed, trained and deployed towards an AI-powered camera.For the initial analysis, an experimental test was performed within an office space of a selected case study building.The detection provided correct detections for the majority of the time (97.32%).Average detection accuracy of 92.20% was achieved for all given activities.Higher accuracy was achieved for sitting (97.22%), as compared to standing (87.02%) and walking (95.83%).This is due to the similarity between the action of standing and walking.Hence, it is important to further develop the model and enhance accuracy for all activity detections and enable the provision of an effective occupancy detection approach for demand-driven systems.
The deep learning detection approach provides real-time data which can be used to generate a Deep Learning Influenced Profile (DLIP).As compared with the actual observation of the occupancy activities performed, a difference of 0.0362% was observed between actual and DLIP.Furthermore, results suggest that the use of static or scheduled occupancy profiles currently used in most building HVAC systems operations and in building energy modelling and simulations presents an over or underestimation of the occupancy heat gains.Based on the initial BES results and set conditions, a difference of up to 55% was observed between DLIP and static occupancy heat gain profiles, this is equivalent to 8.33 kW.

LIMITATIONS AND FUTURE WORKS
Occupancy behaviour and actions are unpredictable, so the results achieved in this present study cannot be entirely used for all buildings and office spaces.Since the detection results were only based on a selected period within a small office space and a limited number of occupants, a series of tests within different types of buildings would be conducted in future studies to verify the feasibility of the approach in a diverse range of indoor environments.Furthermore, factors such as the position of cameras and the room environmental conditions, including obstruction and lighting conditions, would have an effect on the detection accuracy.Hence, the impact of these will be further investigated via the consideration in seeking solutions to improve the model and to adapt with all environmental settings to provide an effective approach used in various building spaces.Moreover, continuous development towards the formation of the most effective occupancy detection method will be conducted.This includes the increase in the number of images located within the model's image datasets, changes towards the model configuration for training purposes and along with tests applied with the performance of various models selected for training.Moreover, other object detection models used for training would be explored and compared with the current model to provide greater insights on selecting and developing a proposed detection method for effective building energy management and optimisation.

Figure 1 .
Figure 1.Overview of the proposed framework of a vision-based deep learning method to detect and recognise occupancy activities exceptional image classification performance of CNN [41], along with its flexibility [42] and popularity within the industry [43] influenced the selection of CNN over other neural network techniques when developing the vision-based occupancy detection and recognition solution.Derived from the understanding of the CNN, Figure 2 presents the CNN based deep learning model configured for the training of the model for occupancy activity detection and recognition.Further discussion of model configuration is outlined within the following subsections.

Figure 2 .
Next, the model was trained and deployed to an AI-based camera to allow the real-time detection and recognition of occupancy activities, as indicated in Part 2 of the workflow.

Figure 2 .Figure 3 .
Figure 2. Convolutional Neural Network (CNN) based deep learning model configured for the training of the model for occupancy activity detection and recognition

Figure 4 .
Figure 4. Example images of various occupancy activities used within the image dataset for training and testing, which were obtained from a relevant keyword search in Google Images; the images were prepared via the labelling of the region of interest (ROI) of each image

[ 57 ]
was selected.The TensorFlow detection model's zoo consisted of various forms of networks pretrained with the Common Objects in Context (COCO) dataset[62].These pretrained models are based on the most popular types of R-CNN frameworks used for object detection.Generally, R-CNN works by proposing bounding-box object region of interest (ROI) within the input image and uses CNN to extract regions from the image as output classification.As compared with R-CNN, Fast R-CNN runs faster as the convolution operation is performed only once for each image rather than feeding a number of region proposals to the CNN every time.Both R-CNN and Fast R-CNN employ selective search to look for the region proposals.With regards to this, it commends an effect on the model training computational time and the performance of the network.Faster R-CNN uses the region proposal network (RPN) module as the attention mechanism instead of using selective search to learn the region proposals [53].Ren et al. [34] introduced the Faster R-CNN algorithm.This similar to Fast R-CNN whereby, it enables input image to feed into the convolution layers and generate a convolutional feature map.Then, the region proposals are predicted by using an RPN layer and reshaped by an ROI pooling layer.The image within the proposed region is then detected by the pooling layer.Overall, all algorithms are suitable to enhance the performance of the network.However, according to the comparison of different CNN-based object detection algorithms [34], Faster R-CNN is much faster than other algorithms, which can be implemented for live object detection [63].Furthermore, to improve such Faster R-CNN model, the inception module can aid towards the reduction of the required computational time [64] and improves the utilisation of the computing resources inside the network to achieve a higher accuracy [53].Inception network is presented in many forms.This includes, Inception V1 -V4 [64, 65] and also Inception ResNet [66].Each version is an iterative improvement of the architecture of the previous one.In this study, the COCO-trained model of Faster R-CNN (With Inception V2) was selected to develop the model for the real-time detection and recognition of occupancy activities.This was chosen due to the performance of Inception V2 and its widespread use for the development of object detections models such as in [34, 66].Alamsyah and Fachrurrozi [67] used the Faster R-CNN with Inception V2 for the detection of fingertips.Accurate detections of up to 90 -94% were achieved across all results, including small variations between fingertips.Hence, this suggests the capabilities of Faster R-CNN with Inception V2 to be able to carry out detection tasks even with small changes.Furthermore, the Faster-R-CNN with Inception V2 trained under the COCO dataset achieved an average speed of 58 ms and a mean average precision (mAP) of 28 for detecting various objects from over 90 object categories [57].Hence, the model summarised in Figure 2, with the configured architecture and pipeline of the selected CNN model was used for occupancy activity detection.Inputs from the CNN TensorFlow Object Detection API and the Faster R-CNN with Inception V2 model were also identified.

Figure 5 .
Figure 5. Sustainable Research Building at University Park Campus, University of Nottingham, UK: photo (a); experimental set up (b); 1st floor plan (c)

Figure 6 .
Figure 6.Process of forming the deep learning influenced profile from the application of the deep learning approach for occupancy activity detection and recognition

Figure 7 .
Figure 7. Deep learning model training results using the Faster-R-CNN with InceptionV2 model over the 6 hours 45 minutes training duration: total loss against the number of training steps (a); total classification loss against the number of steps (b)

Figure 8 .
Figure 8. Example of the confusion matrix for occupancy activity classification model

Figure 10 .
Figure 10.Overall detection performance during the experimental test, identifying the percentage of time achieving correct, incorrect and no detections

Figure 11 .
Figure 11.Detection performance based on: each of the bounding boxes within the camera detection frame of Detection A, B, C and D (a); each of the selected response outcomes of detected activities; walking, standing, sitting and none (b)

Figure 12 .
Figure 12.The number of detected occupants in the select office space (a); the number of detected occupants performing each activity during the one-day detection period using the deep learning occupancy detection model (b)

Figure 13 .
Figure 13.Generated Deep Learning Influenced Profile (DLIP) based on the occupancy activity detection results with the corresponding actual observation for the selected one-day detection

Figure
Figure14presents two static occupancy profiles typically used in HVAC system operations and in building energy simulations to assume the occupancy patterns in building spaces.Both occupancy profiles were formed assuming that there was constant occupancy in the building spaces and fixed values for occupant internal heat gains.Typical Office 1 represents the average heat gain by a sitting person (115 W).Typical Office Profile 2 represents the average heat gain by a walking person (145 W).During the detection period, there was a 37.38% and 50.25% difference between the Typical Office Profiles 1 and 2 and the Actual Profile.Hence, a large discrepancy between the true occupancy activities performed within the building spaces and the scheduled occupancy profiles can be expected.

Figure 16
Figure16 presents a summary of the total sensible and latent occupancy heat gains.Based on the simulated conditions, the occupancy heat gains predicted by using the Typical Office 1 and 2 profiles suggests an overestimation by 22.9% and 54.9% as compared with the Actual Observations.This is equivalent to 83.2 kWh and 199.8 kWh.In comparison, there was a 1.13% kWh) difference between the DLIP method and Actual Observations.

Figure 16 .
Figure 16.Comparison of the total occupancy heat gains achieved using the deep learning approach in comparison with the different typical occupancy schedules

Figure 17 .
Figure 17.Heating load across time (a); total heating load for a selected typical cold period based on the assignment of the different forms of occupancy profiles -static profiles of Typical Office 1 and 2, 'true' Actual Observation and the use of the deep lear ning activity detection approach (b) Tien, P. W., Wei, S.,et al. Occupancy Heat Gain Detection and Prediction… Year 2021 Volume 9, Issue 3, 1080378

Table 1 .
The number of images and labels per category

Table 2 .
[69]cted heat emission rates of occupant performing activities within an office[69]

Table 3 .
Summary of the occupancy and building energy modelling profiles