Impact of data for forecasting on performance of model predictive control in buildings with smart energy storage

Data is required to develop forecasting models for use in Model Predictive Control (MPC) schemes in building energy systems. However, data is costly to both collect and exploit. Determining cost optimal data usage strategies requires understanding of the forecast accuracy and resulting MPC operational performance it enables. This study investigates the performance of both simple and state-of-the-art machine learning prediction models for MPC in multi-building energy systems using a simulated case study with historic building energy data. The impact on forecast accuracy of measures to improve model data efficiency are quantified, specifically for: reuse of prediction models, reduction of training data duration, reduction of model data features, and online model training. A simple linear multi-layer perceptron model is shown to provide equivalent forecast accuracy to state-of-the-art models, with greater data efficiency and generalisability. The use of more than 2 years of training data for load prediction models provided no significant improvement in forecast accuracy. Forecast accuracy and data efficiency were improved simultaneously by using change-point analysis to screen training data. Reused models and those trained with 3 months of data had on average 10% higher error than baseline, indicating that deploying MPC systems without prior data collection may be economic.


Background
The operation of building systems accounts for 30% of final energy consumption and 26% of energyrelated carbon emissions globally [1].Thus, decarbonizing building energy usage is necessary to achieve the targets of net-zero carbon emissions by 2050 [2].The use of distributed generation and storage technologies in building energy systems can enable substantial reductions in operational emissions [3,4,5].Additionally, smart energy storage systems can reduce the impact of building energy usage on the electrical grid by allowing energy flexibility through demand-side management and demand response [6], which is of particular importance in light of the expected electrification of heating demands [7].The effectiveness of these systems, quantified by operational performance metrics of the building energy system such as total electricity cost, incurred carbon emissions, and measures of grid impact, is determined by the ability of the storage control scheme to arbitrage energy and alter the timing of net energy usage to meet these operational goals.Hence, the development of performant storage control strategies has received substantial research attention in recent years [8,9,10].
Model Predictive Control (MPC) is a prominent control methodology in the building energy systems literature.Its application to smart energy storage systems in buildings has been widely studied [11,12,13,14,15], and has been found to provide substantial operational performance improvements compared to Rule-Based control (RBC) [15,16,17], the predominant technology used in existing real-world systems.
MPC is shown to yield equivalent performance to Reinforcement Learning control (RLC) [18], the other major control methodology studied in current literature.Review papers [19,20,21] provide a thorough overview of the applications of MPC to building energy systems with distributed generation and storage.
MPC requires a model of the system dynamics, which estimates the next state of the system given; (a) the current system state, (b) current operational conditions, and (c) the applied control actions.Using this dynamics model and forecasts of the operational conditions over a given planning horizon, it predicts the operation of the system over that planning horizon.Optimization techniques are then used to determine the set of control actions that optimize a specified objective over the planning horizon.The performance of MPC therefore depends on: the accuracy of the system dynamics model, the accuracy of the operational condition forecasts, the ability of the optimization method to identify near-optimal control actions, and the match between the objective over the planning horizon and the global operational goals of energy management for the system.This work focuses on the accuracy of operational condition forecasts, and its resulting impact on the operational performance of MPC.

Forecasting models for MPC
A broad range of time series forecasting methodologies have been studied for the prediction of operational conditions for building energy management [22], however recent literature has focused on the use of machine learning based methods [23,24,25,26,27] due to their promising performance.Successes in other fields have prompted the investigation of large-scale, high-complexity model architectures [28,29].However, doubt remains as to whether such complex models are appropriate for use in the context of MPC for building energy systems [30,31,32], due to computational and data availability limitations in practical systems.As such, this work considers both simple and high-complexity, state-of-the-art machine learning forecasting models.

Study of impact of data on forecast accuracy and MPC performance
As machine learning methods are purely data-driven, "black box" prediction models, achieving accurate forecasts requires the availability of training data that is representative of the building energy system for which the model will be used.However, acquiring representative training data is both challenging and costly.The costs of acquiring data include not only the capital costs of installing monitoring systems, but also the costs of digital infrastructure, data processing, system maintenance, and quality assurance required to support data collection.For the case of developing forecasting models for MPC considered in this study, there are additional project costs associated with delaying the installation of the battery system whilst data is gathered.[33] estimates the capital cost of electricity and gas smart-metering for a university campus to be $0.27/m 2 , plus an additional $0.11/m 2 for maintenance and supporting IT systems.However, this considers only the cost of collecting data, and neglects the significant costs of training and deploying machine learning models [34], from both the computing and expertise required.Whilst the importance of data availability and the impact of data on forecast accuracy are widely acknowledged in the literature [10,15,19,28,35], few works study the role of data in enabling good operational performance for MPC in building energy systems.
Determining cost optimal data collection strategies to support the development of forecasting models for MPC in buildings requires understanding of the trade-off between the quantity of data & data features used for model training, and its associated costs, and the operational performance achieved by the controller.This impact of data for forecasting on MPC operational performance can be considered in two stages, by firstly studying the relationship between data and forecast accuracy of the resulting models, and then the sensitivity of the controller operational performance to the accuracy of forecasts.
Only a single previous study investigating the impact of data on the accuracy of forecasting models for building energy management could be identified.This work, [28], compares the prediction accuracy of deep learning architectures for forecasting thermal loads and building zone temperatures over varying training dataset sizes.It finds that increasing the training data length does not always improve prediction accuracy due to the strong seasonality of building energy behaviours, but that the addition of data with good similarity to the test dataset into the training dataset greatly improves forecast accuracy.A limitation of this study is that it analyses only a single year of thermal load and zone temperature data, synthesised from a building energy model and historical weather data, meaning that a limited range of training dataset sizes, from 3 to 9 months, is considered.As a result, training data preceding the test data by one year, which due to seasonality is likely to have the highest similarity and so greatest value, is unavailable, meaning the study of the benefit of additional training data is incomplete.
The impact of forecast accuracy on MPC operational performance has been studied to a limited extent.[16] compares the use of various classical and machine learning forecasting models in a common MPC framework, quantifying both the forecast accuracy and resulting operational performance of MPC.However, the models studied all achieve comparable forecast accuracies and operational performances, meaning limited insight can be gained into the relationship between the two.Further, as comparison is made between model types, these results cannot be used to assess the benefit of forecast improvement for any individual model.[17] shows that the use of more accurate, external weather forecasts improves MPC operational performance, but does not quantify the forecast accuracies for comparison.In [36], the prediction accuracy and corresponding operational performance of MPC for two forecasts of electrical load with different levels of synthetic noise are quantified.Finally, [37] provides the most complete study, investigating the variation of operational performance with the noise amplitude of synthetic forecasts of temperature and solar irradiance.However, as the forecast accuracy of the synthetic predictions is not quantified, these results cannot be compared to the performance of practical prediction models and used to assess the benefits of forecast accuracy improvements.
The direct impact of training data length on operational performance is studied in [38].It computes the operational performance achieved by a neural network based building thermal controller as the size of the training dataset varies, and finds that negligible performance improvements are obtained when using more than 8 months of training data.Whilst a comprehensive quantification of the relationship between training data volume and operational performance achieved by the specific control scheme studied is provided, the results are specific to the atypical controller architecture used, which does not include the explicit forecasting of operational variables used in typical MPC schemes.Additionally, other aspects of data, such as the inclusion of additional data variables, and the reuse of data from existing buildings, are not considered.
Questions of the impact of data on MPC performance in buildings, and the optimal data collection strategies to support model development, have been explored in the System Identification field [35,39,40,41,42,43] in the context of developing accurate system dynamics models for MPC, termed 'control-oriented models'.Works have investigated the data requirements of different modelling approaches [35,39,40], the impact of data resolution on model accuracy [41], the impact of model prediction accuracy on operational performance [42], and cost optimal data collection strategies to support model development [43].

Research objectives & novel contributions
In the existing literature, the impact of data on the prediction accuracy of forecasting models for building energy management has been studied to a very limited extent.Additionally, there has been no study of the trade-off between the quantity of data & data features used for model training, and the operational performance of MPC for battery scheduling in systems with distributed generation and storage.Understanding of this trade-off is necessary to properly prioritise expenditure on data collection for smart energy storage systems.This study aims to address this research gap by quantifying the impacts of data on both the prediction accuracy and operational performance of MPC using simple and high-complexity, state-of-the-art machine learning based prediction models.Simulation of a multi-building energy system with distributed solar generation and battery storage using historic building load measurement data is used as a case study.
The main objectives of this study are: • Compare the performance of simple and state-of-the-art machine learning models with regards to prediction accuracy, model generalisation, and data efficiency; • Investigate the trade-off between data and forecast accuracy for the following data efficiency measures: reuse of prediction models, reduction of training data durations, reduction of model data features, and online model training; • Propose strategies for improving prediction performance when selecting models for reuse and selecting data to exclude when reducing training data durations; and • Quantify the relationship between forecast accuracy and resulting operational performance of MPC.
The key contribution of this work is the combined study of the impact of data on both forecast accuracy and the resulting operational performance of MPC.This is important as it allows energy system designers to assess the trade-off between the cost of data for forecasting and the operational benefits it provides.The impact of aspects of data on forecast accuracy not yet studied in the context of building energy management are also investigated, specifically the reuse of prediction models, selection of model data features, and online model training.Two strategies for improving the efficacy of collected data for building load forecasting are proposed: a load profile similarity metric for selecting prediction models for reuse, and a change-point detection based methodology for screening training data to improve prediction performance whilst reducing model training time.Further, a long duration (10 year) historic building energy dataset is used to conduct these experiments, allowing performance to be evaluated on a multi-year scenario, providing more robust testing than existing studies.
The remainder of this work is structured as follows.Section 2 describes the simulation framework and data used to perform the experiments, as well as the models tested, and how their performance is evaluated.
In Section 3.1, the forecast accuracy of the prediction models trained without data duration limitations is evaluated to provide a baseline, and model performance is compared.Model generalisation for load prediction between buildings is then tested in Section 3.2, to assess whether model reuse is a viable strategy for reducing data collection requirements for new smart energy storage systems.A load profile similarity metric based on the Wasserstein distance between functional Principal Component Analysis (fPCA) coefficient distributions is proposed, and its efficacy as a criterion for selecting models for reuse is tested.Section 3.

Smart building control simulation framework
To study the impacts of data on forecasting and MPC performance in the context of building energy systems, a case study of an example multi-building building energy system was simulated.In this case study, MPC is used to schedule battery storage operation in the multi-building energy system containing distributed storage and solar generation, as to reduce the electricity price, carbon emissions, and grid impact associated with meeting the electrical demand of the buildings.The CityLearn [44,45] building energy control framework is used to simulate the behaviour of the building energy system, and provide the required data to a Linear Program based MPC implementation.A schematic of the energy flows within the simulated multi-building energy system is provided in Fig. 1.
During simulations, at each time step, the prediction models use observation data to produce forecasts of the operational variables, which are passed to a linear predictive control model.The resulting linear optimisation problem is solved to determine the optimal control actions, which are then applied to the battery units in the CityLearn simulation.The combination of prediction models and linear predictive control model comprise the Linear MPC controller.
The Linear Program formulation used in the MPC scheme is described by Eq. 1, with Table 1 providing descriptions of the parameters.At each time step the optimised control actions, E i * [τ =0], are taken.
The optimisation objective is comprised of three weighted components, which correspond to the cost of grid electricity consumed by the buildings (assuming no net electricity metering), the embodied carbon emissions associated with the grid electricity, and the ramping of the overall grid electrical demand which represents the grid impact.The three components are normalised by the values they would have if no battery storage were present in the buildings, denoted by O k for component k, lower bounded at 1.This clipping is performed to prevent ill-conditioning of the objective when the no-storage objective values are small.
The CityLearn simulations are configured so that the building energy system has linear dynamics, which is possible as a solar-battery system is studied and only electrical behaviour is considered.As the system parameters are known to the controller, the MPC has a perfect model of the true system dynamics.This perfect match between the simulator and controller dynamics models means no inaccuracies are introduced in system identification.The optimality guarantees of Linear Programming, and the use of the global operational objective as the MPC objective, with sufficiently large planning horizon T , mean there is negligible distortion of the operational performance from these factors.This allows the effect of operational condition forecast accuracy on MPC operational performance to be studied in isolation -i.e. to a good approximation, sub-optimality in operational performance in the simulation environment is caused solely by forecasting inaccuracies.

Parameter Units Description
Decision variables

Forecasting task & performance evaluation
For the MPC scheme used, Eq. 1, forecasts of the following operational condition variables over the planning horizon T are required at each time instance, t: • electrical demand for each building, L i [t] • normalised solar PV generation power, g[t] • price of grid electricity, p e [t] • carbon intensity of grid electricity, c[t] This study investigates the impacts of data on the forecasting of these 4 types of operational variables.
The accuracies of these forecasts are quantified using two error metrics, the normalised Mean Absolute Error (nMAE), and the normalised Root Mean Squared Error (nRMSE), given by Eqns 2 & 3, where f v t,τ is the forecast of variable v at time t for time instance τ in the planning horizon, and v t+τ is the true value of the target variable at time instance t + τ .These error metrics are the means of the standard MAE and RMSE errors over all forecasting horizons considered in the simulation, normalised by the mean level of the target variables to allow comparability between forecast accuracies.
The operational performance achieved by the MPC scheme using a given set of prediction models is quantified by evaluating the objective specified in Eq. 1 with the simulated behaviour of the building energy system resulting from the use of the controller.
All experiments conducted in this study test the same multi-building energy system, with distributed energy assets as specified in Appendix A, and use a planning horizon of T = 48hrs, justification of this value is provided in Appendix B, along with objective component weights, (γ p , γ c , γ r ) = (0.45, 0.45, 0.1), in both the MPC scheme and for operational performance evaluation.

Cambridge Estates electricity use dataset
A dataset of historic building electricity usage measurements from a set of buildings across the Cambridge University Estates covering the period 2010 to 2019 [46] is used to provide a case study for the multibuilding energy system.The dataset consists of 10 years of historic electricity usage data from 30 buildings of varying use types in the University Estate, such as lecture blocks, offices, laboratories, and museums, alongside weather observations and grid electricity price and carbon intensity data.The electricity usage measurements record the total electrical load of each building, which includes lighting, plug loads, and plant equipment electricity consumption.It is assumed that none of the buildings have heat pumps or AC units installed, meaning the electricity usage does not include any contributions from space heating or cooling.
In all, the dataset contains the following variables: building electrical load data, weather data for Cambridge from the Met Office MIDAS dataset [47] (temperature and relative humidity) and renewables.ninjareanalysis model [48,49] (direct and diffuse solar irradiance), dynamic electricity pricing tariff data from Energy Stats UK [50], grid electricity carbon intensity from the National Grid ESO Data Portal [51], and temporal information including hour, day, and month indices, as well as daylight savings status.All data is available at hourly resolution.Further detail on the data sourcing and processing is available in reference [46].
The 10 years of available data is initially split into training, validation, and test datasets covering the following periods: train (2010 to 2015), validate (2016 to 2017), test (2018 to 2019).For all experiments the test data is kept the same, however the periods of data used to train the prediction models are altered in Section 3.3.Of the 30 buildings available, 15 are selected1 for use in the experiments, such that they cover a wide range of building scales and provide a good mix of similarity and dissimilarity with respect to their electricity loads.

Brief description of the prediction models
Data and computational requirements, which vary across predictions models, are important considerations for the deployment of MPC based controllers in practical building energy systems.This work investigates the performance of 6 machine learning based prediction models which span a range of model characteristics; 3 simple neural models, and 3 high-complexity, state-of-the-art models.A brief description of each model architecture follows.Technical specifications of the model implementations used are provided in Appendix C.

Simple neural models
Recent literature, [30], has shown that simple neural models using Direct Multi-Step forecasting (DMS), where all predictions over the forecast window are generated concurrently, can outperform complex, transformerbased models using traditional, Iterated Multi-Step forecasting (IMS), in which a single-step forecaster is applied iteratively to generate a multi-step forecast.For all three simple neural architectures investigated, DMS forecasting is used.

Linear neural network (Linear). A Multi-Layer Perceptron (MLP) model that maps the inputs directly to the output without an activation function (non-linearity).
Residual multi-layer perceptron (ResMLP).A Residual MLPSkip model (MLP model with skip-connections), comprised of a single hidden layer with 128 neurons.

Convolutional neural network (Conv).
A Convolutional Neural Network (CNN) model that contains convolution layers followed by a linear layer.The architecture used comprises two layers with kernel sizes of 6 and 12, with five and one channels, respectively.

State-of-the-art machine learning models
Temporal Fusion Transformer.The Temporal Fusion Transformer (TFT) model [52], developed by Google, is an attention-based architecture that enables the fusion of data from multiple input sources to inform predictions.The neural structure contains features which allow for the learning of multiple underlying relationships across temporal scales, and the attention mechanism allows for interpretation of the model predictions, i.e. which data the model is exploiting to produce its forecasts.The model uses categorical covariates of date-time information, as well as temperature information for predicting building loads.
Neural Hierarchical Interpolation for Time Series Forecasting.Neural Hierarchical Interpolation for Time Series Forecasting (NHiTS) [53] is an MLP model which learns a set of basis functions at different frequencies that describe the underlying patterns in the training data, and produces forecasts by using hierarchical interpolation to combine predictions from the basis functions in a computationally efficient manner.It uses categorical covariates of date-time information.
DeepAR.DeepAR is a Recurrent Neural Network (RNN) based model developed by Amazon [54], which has been widely applied in a range of research areas.It is a probabilistic forecasting model, but for this study only the mean prediction is used.The model uses categorical covariates of date-time information.

Baseline prediction accuracy comparison
The prediction accuracy of each forecasting model trained using 8 years of training data, the maximum available, was evaluated to provide a baseline for the model architectures in a setting without data limitations.For brevity, forecast accuracy results are discussed for the nRMSE metric only, however equivalent results were found with the nMAE metric.Figures 2a & 2b show that the simple Linear model achieves similar or better prediction accuracy (lower nRMSE values) compared to the complex, state-of-the-art models across all prediction variables.The Conv and ResMLP models provide similar accuracy when predicting building electrical loads, however both have significantly worse accuracy for the electricity price and carbon intensity prediction variables.Complex models achieve slightly better accuracy for electricity price and solar generation predictions, at most 8.5% and 4.2% lower nRMSE than the Linear model, achieved by NHiTS and TFT respectively.However, for some buildings the complex models exhibit very poor forecast accuracy for load predictions.These instances of poor accuracy are found to correlate with a measure of the similarity between the train and test datasets for building load, called the 'Wasserstein similarity metric', which is described in Appendix D and is used for the study of model reuse in the following section.Fig. 2c plots the relationship between prediction accuracy and data similarity for each model, and shows that complex models provide poor prediction accuracy when the training and test data are significantly different.Hence, simple neural models provide better prediction generalisation under changes in building load dynamics between the training and test data, which is highly advantageous for application to practical systems, as occupant driven load dynamics may change after system installation, e.g.due to a change of building use.
These results indicate that simple neural models have sufficient expressivity to well capture the underlying trends in building energy usage, but learn sufficiently simple relationships about the data as to avoid the problem of over-fitting experienced by the complex models, leading to better generalisation in time.This suggests that the majority of the trends in the load data are present within the 1 week input window used by the simple models, which agrees with the strong daily and weekly trends of typical building energy usage, caused by building occupancy patterns.It is likely that annual trends are well captured by the mean value over the input window, as these trends are slow relative to the prediction length.Additionally, the predictions required by MPC are relatively short in length, 48 hours in this case, whereas the complex models are found to provide more stable predictions and greater overall accuracy for longer duration forecasts [53,55].Overall, simple neural models are able to provide analogous prediction accuracy to complex models across all prediction variables, and do so with substantially lower computational requirements.Table 2 provides the training time and inference time (time to generate forecasts) of the baseline models, and shows that simple models require roughly 500x less computation for inference.Hence, in practical systems the use of simple models would enable shorter prediction intervals, leading to higher frequency control, and allow for the use of lower cost compute hardware.Further, due to their high computational cost, the complex models were only able to use a 72 hour input window [56].As a result, they have less information available to inform their predictions, likely limiting the accuracy they could achieve.Therefore, the computational efficiency of simple neural models enabling the use of longer input windows provides an additional advantage.The similar prediction accuracies of the tested models are found to lead to similar operational performance when used in the MPC scheme, see Fig. 2d, where the dashed line indicates the bound on operational performance achieved by an MPC scheme with perfect forecasts.The use of the MPC controllers with the specified battery systems leads to an average 8.7% improvement in operational performance for the multi-building energy system, with a range of 7.3% to 9.8%.

Model generalisation
When a solar-battery system using MPC is installed, high-resolution historic load metering data (e.g. from smart meters) may not be available for the building.In this case, the project must either be delayed to allow time for data collection, incurring a significant cost, or a prediction model trained on load data from another building must be used, potentially incurring an operational performance penalty due to worse prediction accuracy.Model reuse greatly reduces data collection requirements and the associated costs, however its appropriacy depends on the ability of load prediction models to generalise between buildings, and the trade-off between data cost savings and increased operational cost due to lower forecast accuracies.
The generalisation of prediction models between buildings is tested by using the baseline models trained on data from each of the 15 buildings to forecast electrical load for every other building.Fig. 3 shows the distribution of relative forecast accuracies achieved by each model over all buildings (DeepAR is excluded due to excessive computational costs), where the y-axis plots the nRMSE forecast accuracy of the tested model (potentially trained on a different building) normalised by the forecast accuracy achieved by the model trained on data from the target building.The Linear model provides similar generalisation performance to NHiTS, and is substantially better than all other models.It is proposed that the Linear model, being mechanistically equivalent to linear regression, achieves good generalisation as it learns relatively simple relationships about the load data, which are consistent between buildings.Whereas NHiTS achieves good generalisation due to its behaviour of generating smooth forecasts [53], making it less susceptible to producing erratic and inaccurate predictions with unseen input data.As before, TFT suffers from over-fitting, leading to relatively poor generalisation.Whilst the Conv model was able to provide good generalisation in time where the dataset similarity was close, indicated by small Wasserstein metric values in Fig. 2c, between buildings the load data is much less similar, and the relationships learned by the Conv model are no longer valid, leading to poor prediction accuracy.It is suggested this is due to the features learned by the pooling process no longer being pertinent for the new building.
If a pre-trained Linear model is selected for reuse at random, the average prediction accuracy is substantially worse than a model trained on data collected from the target building (45% higher nRMSE, corresponding to the mean of the violin plot).However, in a real scenario where an ensemble of pre-trained models is available, the smart energy storage system designer could reuse a model from a building that is expected to be similar to the target building (e.g. by size, usage type, envelope characteristics, load dynam- ics, etc.), to maximise the likelihood of good generalisation.A load profile similarity metric is additionally proposed as a way off assessing the similarity of building load dynamics for the purposes of model reuse.
This similarity metric is based on the similarity of the distributions of underlying mode shapes within each load profile, and is referred to as the 'Wasserstein similarity metric' as it uses the Wasserstein distance between mode shape coefficient distributions.Appendix D describes the calculation of similarity metric values.The correlation between generalisation of the Linear model and the Wasserstein similarity metric values between the training datasets of the reused model building and target building is shown in Fig. 4.
The cluster of points in the bottom left indicates that buildings with similar load profiles (low Wasserstein metric values) can provide models that achieve good generalisation.When selecting models for reuse by minimising the Wasserstein metric, the average relative prediction accuracy improves to an 11% increase in nRMSE, with a range of -7.9% to +52.6%, showing that some reused models outperform those trained on the true building training dataset.
These results show that when developing a load forecasting model for a new building system where little or no data is available, the reuse of forecasting models trained on existing buildings provides a promising 2 alternative to the collection of data and training of a building specific model, and the associated costs.
However, reusing prediction models increases uncertainty in the operational performance the MPC system will achieve, as models trained on load data from buildings similar to the target building do not always provide good prediction accuracy, as shown by the wide range of reused model accuracies.
2 Further work is required to determine the feasibility of identifying similar buildings from the model ensemble in practice.
Whilst a small sample of load data from the target building could be used to approximate the Wasserstein similarity metric, the quality of this approximation must be weighed against the forecast accuracy that could be achieved by a model trained using that data.Though, additional information on the building, such as usage type, could be used to guide model selection for reuse.

Data efficiency
Data required for developing forecasting models incurs cost from both its acquisition and the computation required to exploit it.However, using longer durations of training data or additional data variables can increase forecast accuracy, resulting in lower operational cost of the energy system.Therefore, any measures to reduce the quantity of data used must be traded-off against the corresponding decrease in prediction accuracy.This section explores a series of aspects of data efficiency and their effect on forecast accuracy.Subsequently, section 3.4 investigates the impact of forecast accuracy on operational performance of the MPC.
Testing data efficiency requires several versions of a prediction model to be trained.Due to computational limitations, for some data efficiency tests only the Linear prediction model is studied, as the complex models are too computationally expensive, and results from previous sections show the Linear model is the best performing simple model.

Volume of training data
Using greater volumes of training data, longer durations of historic measurements of buildings, can improve model prediction accuracy by avoiding over-fitting, improving temporal generalisability, but comes at a roughly linear cost.The trade-off between the length of training data used, the combined duration of the train and validate datasets, and forecast accuracy is investigated by training prediction models on subsets of the building energy dataset of varying durations from 8 years, the maximum duration available, to 3 months (one season), the shortest duration tested in [28].For the complex models, at least 2 years of training data was required due to the use of temporal covariates.Fig. 5 plots the proportional improvement in prediction accuracy of each model (negative improvement shows worsening model accuracy) compared to baseline when training using different data volumes.Across a broad range of model architectures, reducing training data length down to 2 years has a limited impact on prediction accuracy, in some cases improving prediction accuracy.Trends differ between prediction variables, however for the Linear model, in the majority of cases there is a small prediction accuracy penalty between 8 and 1 years of training data, and then a rapid worsening of forecasting accuracy below 6 months.This indicates that when making data collection decisions to support prediction model development for building electrical energy systems, at most 2 years of measurement data should be gathered.However, at least two seasons of data (6 months) are required to prevent model over-fitting and learn trends which generalise across seasons.
Screening training data using change-points.Once building data has been gathered, there remains a question as to whether all of the available data should be used for training, or whether some data should be excluded to improve model prediction.For instance, if data that is non-representative of the present building load dynamics can be excluded, prediction accuracy can be improved alongside data efficiency.Change-point analysis can be used to screen training data for this purpose, by detecting changes in building load dynamics and excluding data preceding the change from training.A change-point analysis using the BEAST algorithm [57] was performed on the building load dataset, described in Appendix E. Points where changes in load trend were detected were used to screen the training data for 7 selected buildings.In order to allow a better analysis of the detected change-points, the training dataset was increased to 7 years (2010-2016), leaving 1 year of validation data (2017).Linear prediction models were trained using sections of the training data starting at each of the detected change-points and running to the end of 2016.
Fig. 6 shows the relative improvements in prediction accuracy of models trained on data screened using change-points, compared to the case where the full 7 years of available data is used, where the x-axis position indicates the timing of the change-point.For most buildings, screening the training data using change-points improves prediction accuracy, indicating that non-representative data is being removed from the training dataset.This suggests that ex-post training data screening can be used to improve model accuracy whilst reducing the quantity of training data used, and so computational cost.These results correspond with the findings of [28], which shows that additional training data only improves prediction accuracy if it is sufficiently similar to the test data, and can reduce accuracy when insufficiently similar.However, the results of this study demonstrate an additional consideration, which is that prediction accuracy worsens substantially when insufficient training data volume is used.This suggests a trade-off between having enough data to avoid over-fitting, and removing non-representative data which causes over-generalisation of the model.Figures 5 & 6 indicate that at least 1 year of training data is required to achieve good prediction accuracy, after which only sufficiently similar data should be included in the training dataset.The reason for the difference in results compared to [28] is likely due to this study testing model prediction accuracy on 2 years of data, compared to 0.1 years in [28].As a result, models in this study must capture the seasonality of building load dynamics, hence requiring at least 1 year of training data to achieve good prediction accuracy.
The data screening method considered in this study is relatively simple, using change-points to exclude data by classifying it as 'non-representative' of current behaviour.More advance techniques could be created by quantifying the similarity between the sections of data identified using the change-points and the validation data, e.g. using the Wasserstein similarity metric, and selecting sections of data to use in the training dataset using both the probability of a true change having occurred and the data similarity metrics.
Change-point analysis can also be used online to detect real time changes in building load dynamics and provide indication of when the prediction models should be updated (i.e.retrained using recent data), due to the training data of the current model no longer being representative of the building behaviour, causing reduced prediction accuracy.This is particularly pertinent in the context of climate change, which is expected to significantly impact the energy usage behaviour of buildings, for instance reducing peak heating loads due to higher winter temperatures, and increasing summer electrical loads to provide cooling during prolonged heat waves.

Data features
The variables that can be monitored in building energy systems are often well correlated, for example ambient temperature and electrical load.As a result, covariates can be used by prediction models to attempt to identify underlying links between the variables and improve forecast accuracy.However, the use of additional data variables (features) incurs a cost from both the collection and exploitation of the extra data, for instance the purchase of proprietary weather forecasts.The impact of feature selection on forecast accuracy is investigated by comparing the prediction accuracy of models trained using varying numbers of data features.The Pearson correlation coefficient between the 11 data features available in the building energy dataset are shown for the case of Building 0 in Fig. 7. Linear prediction models were trained using the n most correlated variables/features for each of the electrical load, solar generation, electricity price, and carbon intensity target variables.Feature selection was performed separately for each case by ranking Pearson correlation coefficients between the target variable and all other available covariates.Fig. 8 shows the impact of the number of data features used by the models on forecast accuracy.For most prediction variables, the inclusion of additional data features worsens forecast accuracy.Several factors could contribute to this behaviour: • Over-fitting [58] : The incorporation of an excessive number of features may lead the model to over-fit the training data, worsening its predictions for unseen data.
• Multi-collinearity [59] : The presence of highly correlated features can destabilize the model, resulting in worse prediction accuracy.• Curse of Dimensionality [60] : An increase in the number of features increases the dimensionality of the data, which can lead to data sparsity and degradation of prediction accuracy.
These results show that testing should be performed before decisions are made regarding the collection of covariate data to ensure that, for the type of system in question, its use will firstly improve prediction accuracy, and secondly that said accuracy improvements warrant the cost of the data.This study considers a simple, correlation based feature selection method, however more advanced, optimization based techniques [61,62] could be used to determine the set of available features which provides the optimal trade-off between data cost and model prediction accuracy.Higher update frequencies led to increased prediction accuracy across all prediction variables.For example, prediction accuracy for grid electricity price improved by 15.7%, 14.0%, 12.1%, and 8.1% when updated monthly, quarterly, semi-annually, and annually, respectively, compared to the baseline model which is not trained online.Online training allows the prediction model to adapt to changes in the underlying trends which occur over time, as it can learn the trends in recent data.This improves prediction accuracy as the model does not need to generalise in time, as it is trained (updated) on data that is representative of the current prediction horizon.Therefore, it is expected that the additional computational hardware and system complexity required for online trained prediction models will be worthwhile in practical systems.

Sensitivity of Model Predictive Control to forecast accuracy
Whether investments in additional data to improve forecast accuracy provide net benefit to the operation of a smart energy storage system depends on the resulting improvements in operational performance achieved by the MPC controller.The determination of optimal data collection strategies therefore requires quantification of the relationship between forecast accuracy and MPC operational performance.To investigate this in a controlled setting, the operational performance achieved by the tested MPC scheme is evaluated with the use of synthetic forecasts.The synthetic forecasts are produced by adding a Gaussian random walk noise component to the ground-truth values, as described in Eq. 4, where v t+τ is the ground-truth value of variable v at instance τ in the planning horizon of the forecast created at time t, and the noise level σ can be selected.The MPC scheme using these synthetic forecasts for all prediction variables is tested at varying noise levels.Fig. 10a shows the resulting operational performance of MPC, as well as the three components it is comprised of (electricity price, carbon emissions, and grid impact), as the prediction accuracy of the synthetic forecasts varies.Fig. 10b shows the variation in overall operational performance of the MPC when synthetic forecasts are used for each type of prediction variable in turn, with perfect forecasts used for all other variables.The horizontal line at 1 indicates the performance of the building energy system without battery control, which is the point at which the MPC controller becomes redundant.
The results show that the tested MPC scheme is most sensitive to the forecast accuracy of the grid electricity price and carbon intensity variables, suggesting that the most resource and expense should be invested in producing accurate forecasts of the grid conditions.Additionally, whilst the prediction models tested in this study performed worst when forecasting solar generation, see Fig. 2b, it may not be worth expending additional resource to improve these forecasts as the MPC scheme is least sensitive to this variable.
Though the grid component of operational performance is found to be by far the most sensitive to forecast accuracy, this is likely due to the synthetic forecast model used3 , and is not considered to be reflective of the behaviour of real prediction models.

Conclusions
This study investigated the impacts of data on the prediction accuracy of forecasting models for building operational conditions, and the resulting operational performance of a Model Predictive Control (MPC) scheme in a multi-building energy system with distributed generation and storage.Experiments were conducted using a large-scale dataset of electrical load measurements from buildings in the Cambridge University Estates.
A simple linear multi-layer perceptron model (Linear) using DMS prediction was found to achieve equivalent forecast accuracy to high-complexity, state-of-the-art machine learning models in a setting without data limitations, but has substantial advantages regarding data efficiency and generalisation performance.
Therefore, this simple neural model is preferable for use in practical MPC systems due to its lower data and computational requirements, and better performance on new load dynamics.
Using more than 2 years of hourly resolved training data did not provide significant improvements in prediction accuracy for most of the models tested, indicating that the collection of monitoring data for longer durations is unnecessary for the development of performant MPC schemes.Further, screening training data using change-point analysis to remove low similarity data was able to simultaneously improve data efficiency and prediction accuracy, provided at least 1 year of training data was kept.This could also be achieved through the removal of redundant data features from models.
The reuse of Linear models for load prediction between buildings was shown to be an effective way of reducing data collection requirements.When selecting models for reuse using a proposed load profile similarity metric based on the Wasserstein distance between fPCA coefficient distributions, model reuse led to an average 11% increase in prediction error.However in comparison, models trained using only 3 months of building specific data provided forecasts with an average 9.9% increase in error compared to the baseline.Additionally, online training of prediction models was shown to be highly beneficial, with higher update frequencies providing greater forecast accuracy improvements, of up to 15.7% for the case of monthly updating.Hence, monitoring data should be used to update reused prediction models in situ to tailor them to the building system, ultimately replacing the reused model.The results suggest that, by exploiting existing building energy datasets to pre-train models, in many cases sufficient forecast accuracy can be achieved without the collection of any building load data prior to the installation of the system.
The relationship between forecast accuracy and operational performance of the MPC controller was investigated for synthetic forecasts, which showed the MPC scheme is most sensitive to grid electricity price and carbon intensity prediction accuracy.This analysis methodology would allow decision makers to determine whether expenditure on data collection to produce forecasting models is economic for a practical building energy system, i.e. whether the costs of data are outweighed by the improvements in operational performance provided.
data sample that describe how the data sample maps to the mean function.The warping function describes the phase relationship, i.e. the variation in time, and the amplitude function describes changes in magnitude.
The warping and amplitude functions are then analysed separately and fPCs generated for both.The approach is illustrated schematically in Fig. D.12, and full details of the approach are described in (Ward, 2021) [66].
3 studies the impact of various aspects of data on forecasting accuracy.The effect of the volume of data used for model training is studied in Section 3.3.1 to support decision making on the quantity of data that should be collected for model development.A change-point detection based methodology for screening training data is proposed, and its ability to improve prediction accuracy whilst reducing training data durations is investigated.The selection of model data features and use of online model training are considered in Sections 3.3.2and 3.3.3respectively.Section 3.4 contextualises the study of model forecast accuracy in the building energy system control task by quantifying the relationship between forecast accuracy and the resulting MPC operational performance for synthetic noisy forecasts.Finally, conclusions are drawn in Section 4.

Figure 1 :
Figure 1: Energy flow schematic for test multi-building energy system.(Icon credits: Symbolon) Comparison of baseline load prediction accuracy of forecasting models.Correlation between model prediction accuracy and train-test data similarity metric value (lower Wasserstein metric means more similar data).Comparison of MPC operational performance using baseline forecasting models.

Figure 2 :
Figure 2: Comparison of baseline prediction accuracy and operational performance of forecasting models.

Figure 3 :Figure 4 :
Figure 3: Comparison of forecasting model generalisation.Violin plot with quartiles indicated by horizontal lines.

Figure 5 :Figure 6 :
Figure 5: Average improvement in prediction accuracy over all prediction variables with years of data used for training.

Figure 7 :
Figure 7: Pearson correlation between data variables for Building 0.

Figure 8 :
Figure 8: Variation of prediction accuracy with number of data features included in Linear model.

3. 3 . 3 .
Online training During building operation monitoring systems continuously collect operational data.The characteristics of building behaviour can change during operation due to external factors such as weather & climate, occupancy, and equipment degradation & maintenance.Continuous online training updates the predictive

Figure 9 :
Figure 9: Variation of prediction accuracy with online update frequency.

Figure 10 :
Figure 10: Sensitivity of operational performance to synthetic forecast noise simulating prediction inaccuracy.

Fig. D. 13
Fig. D.13 illustrates the first two phase and amplitude fPCs extracted from the dataset.Fig. D.14 shows distributions of example fPC coefficients for Buildings 38 and 49.The load data for Building 49 exhibits a much lower range than that for Building 38, which results in a more positive distribution of V1 fPC coefficients.In Fig. D.13 positive coefficients V1 fPC can be seen to reduce the data range.

Figure D. 16 :
Figure D.16: Similarity metric scores between training datasets for pairs of buildings.

Figure D. 17 :
Figure D.17: Similarity metric scores between pairs of training, validation, test datasets for each building.

Figure E. 18 :
Figure E.18: Change-points detected using the BEAST algorithm in load profiles of Buildings 26 & 44 (top row).Signal is decomposed into a seasonal component (second row), assumed harmonic, inferred change-points indicated by red vertical lines, and a trend component (fourth row), with green lines indicating change-points.Credible intervals of the estimated signals are given by grey envelopes around the individual components.

Table 1 :
Energy intake to battery unit in building i at time τ in planning horizon SoC i [τ ] kWh State-of-charge of battery unit in building i at time τ in planning horizon Description of Linear Program model parameters.

Table 2 :
Computation times for baseline models, trained on 8 years of data, and predicting for simulations of 2 years duration.