Sensitivity analysis to reduce duplicated features in ANN training for district heat demand prediction

• the relationship between data of weather, demand. duplicated features. ANN model to reduce the training time. validated in a case study of predicting the district heat demand. Artiﬁcial neural network (ANN) has become an important method to model the nonlinear relationships between weather conditions, building characteristics and its heat demand. Due to the large amount of training data required for ANN training, data reduction and feature selection are important to simplify the training. However, in building heat demand prediction, many weather-related input variables contain duplicated features. This paper develops a sensitivity analysis approach to analyse the correlation between input variables and to detect the variables that have high importance but contain duplicated features. The proposed approach is validated in a case study that predicts the heat demand of a district heating network containing tens of buildings at a university campus. The results show that the proposed approach detected and removed several unnecessary input variables and helped the ANN model to reduce approximately 20% training time compared with the traditional methods while maintaining the prediction accuracy. It indicates that the approach can be applied for analysing large number of input variables to help improving the training eﬃciency of ANN in district heat demand prediction and other applications.

h i g h l i g h t s g r a p h i c a l a b s t r a c t • ANN models the relationship between data of weather, building and heat demand. • Weather-related input variables contain duplicated features. Proposed approach helps ANN model to reduce the training time. • The approach is validated in a case study of predicting the district heat demand.

Introduction
In recent decades, a lot of efforts have made to reduce greenhouse gas emissions and the consumption of fossil fuel to mitigate the global environmental degradation and warming [1] . Many countries have deployment targets for low carbon technologies such as photovoltaic panels, electric vehicles and heat pumps with the aim to cut CO 2 emissions [2] . In terms of global energy consumption, buildings are becoming an important sector in current and future energy landscapes [3][4][5] . In Europe, the energy consumption of buildings has steadily increased building electricity power consumption is difficult due to the complexity of the system inside the building and varieties of loads, especially the electric appliances [7] . Therefore, the accurate prediction of buildings heat demand, regardless of the supplies of heat demand, becomes the target [10] .
The modelling of building heat demand can be classified into three main categories: engineering modelling methods, data-driven methods, and their hybrids [11,12] . Engineering methods develop bottom-up building models and simulate the heat transfer process based on physical principles [13] . Most building energy simulators use bottom-up building models and heat transfer principles to simulate the energy consumption, e.g. TRNSYS, EnergyPlus and Integrated Environmental Solutions Virtual Environment (IES-VE) [14,15] . However, building energy simulators require a detailed description of the building to account for the end-use heat demand and have long simulation times, especially for large energy networks. The required number and accuracy of parameters and computational cost are the main drawbacks of these methods. For these reasons, data-driven methods have become popular as an alternative modelling approach to predict the heat demand of buildings.
The data-driven models are developed using statistical methods to fit the input parameters to outputs without any knowledge of their physical relationship, a so called 'black-box' ǥ= model. The input parameters for building energy consumption include both the environmental parameters, such as temperature, solar radiation, humidity, and atmospheric pressure, as well as building design parameters, such as percentage area of windows, thermal properties of walls, and building orientation and also the occupant behaviour [16] . Apart from these, recent research indicates that the time series data, such as time of day and days of the week, is also important as input variable of data-driven models [4] . That is because the heating time is normally correlated to both the indoor temperature and heating mode, which can be known as a kind of occupant behaviour. Comparing with the conventional regression modelling technique, artificial intelligent (AI) techniques are known to perform more reliable and efficient in many modelling tasks [17][18][19] . The commonly used AI techniques include genetic algorithms, support vector machine and artificial neural networks (ANN) [20,21] . ANN has become one of the most important methods in empirical nonlinear modelling and is widely used to model complex functional relationships between weather conditions and building characteristics as inputs and its heat demand as outputs [16,22] . The commonly used measurement methods for validating the result of ANN models include mean absolute error (MAE), mean square error (MSE), correlation (R), and coefficient of determination (R 2 ) [23] . The major advantages of ANN are its very low model construction cost and ability of flexible input-output mapping for complex systems [24] . Whatever complexity of the target system, the ANN is able to use the simplest construction to model its behaviour. This feature of ANN make it widely used to forecast irregular variables, which includes geography information such as wind speed [25] and global solar radiation [26] , random building energy usage such as electricity usage [9] , cooling load [27] and energy consumption [17,22] , and power generation systems such as photovoltaic [28,29] and PV/thermal system [23] .
Although the construction of ANN has these advantages, its shortcomings are the large amounts of data required for the training, long training time, high risk of overfitting and difficulty of interpreting the knowledge gained by 'black-box' models [30] . A common way to reduce these shortcomings is to delete unimportant data components in the training sets to obtain smaller networks, reduced-size data vectors and minimised redundancy in the training data [31,32] . This can be achieved by analysing the total disturbance of network outputs due to perturbed inputs [32] . Reducing the number of inputs to an ANN model and to select key variables is known as feature selection, which aims at identifying the most relevant input features within a dataset [33][34][35] . Researchers have analysed different methods of feature selection for ranking and identifying important inputs, such as sensitivity analysis, fuzzy curves, and change of MSE [31] .
Sensitivity analysis identifies which input parameters are important for the prediction of the output variable and also quantifies how the changes in the values of the input variables alter the value of the output variable [30,36] . Several methods have been proposed to explain the contribution of variables in ANN models, including the adaptation of their connection weights [37] , a fictitious input matrix considering a successive variation of one input variable while the others are kept constant [38] , the connection weights selected by a randomization approach [39] , a perturbation of the input variables [40] , the partial derivatives of the output according to the input variables using the connection weights of the ANNs [41,42] . Typical model sensitivity analyses are 'one-at-a-time' simulations that evaluate the impact of each input in turn and ignores the interactions with other input variables [42,43] . However, in predicting the district heat demand with given weather information, it is found that most input variables contain duplicated feature even if they show a high importance in the sensitivity analysis indicating that the training data can be further simplified to reduce the training cost and the risk of overfitting. Therefore, the correlations among inputs also need to be analysed to simplify the training data and remove the duplicated features.
This paper develops a sensitivity analysis method to rank the input variables and to identify input variables with duplicated feature. Both methods are used to remove features in order to reduce the training data and time, and thus improve the efficiency of ANN while maintaining the prediction accuracy. The proposed approach analyses the correlation among inputs by calculating the coefficient of determination of each variable with all others. The results are used to remove variables with low importance as well as variables that have high importance but with duplicated feature from the set of input variables. The approach is evaluated in a case study of predicting the heat demand of a district heating network containing tens of buildings at the campus of the University of Glasgow.

Artificial neural network used in building heat prediction
In predicting the energy consumption of a district heating network, using engineering simulation to build a bottom-up model of each building is not efficient because it would require a very large number of hard to get building and occupant activity data, and the simulation would be computationally expensive and time consuming. The motivation of this paper is to build an ANN model that is able to predict different types of buildings. If the model is only trained with the weather profile without giving the building characteristics, it will only be able to predict the heat demand of a particular building. Therefore, the building characteristics is added as the inputs additional to the weather conditions in the ANN training. After the model is well trained, it will be used to predict the heat demand of other buildings by changing the inputs to characteristics of new buildings. Thus, the ANN model is expected to predict the heat demand of a district containing dozens of buildings after trained by the data of environment profile as well as building characteristics and heat demand of some sample buildings, as shown in Fig. 1 .

Artificial neural network model
Neural network is a computational model for nonlinear data fitting that typically includes the input layer, hidden layers, and the output layer [20] . There can be one or more hidden layers depending on the complexity of the model and training data. Each layer has several neurons and every neuron is connected to the output of all the neurons in the previous layer through adaptable synaptic weights [44] . The training of ANN uses a group of input patterns in a data mapping process to produce the dependent variables for the corresponding inputs [45] . In the ANN data mapping process, the neurons in the input layer are multiplied by the weight of corresponding neurons in the hidden layer and then summed up with bias to the neurons in the output layers [22] . The predicted results are compared with the historical data, and their errors are used to update the neurons weights by suitable adaptation [20] .
The process of ANN can be described in mathematical formulas. Define ( = 1 , 2 , … , ) as the k th input attribute value which is passed along the links to the other layers. The weighted sum of signals, Σ, arriving at the input of the next neuron is subjected to a transfer function, which is most commonly the sigmoid function [46] with the following formula The j th hidden neuron ℎ ( = 1 , 2 , … , ) receives the sum of neuron value multiplied by the weights (2) and bias (2) associated with the link as The output neurons are defined as ( = 1 , 2 , … , ) , which are summed up with their input signals and activation transfer function as where f is the activation function, the sigmoid function used in the paper; (1) , (1) , (2) and (2) are the weights and bias linked to the output layer (1) and hidden layer (2), respectively. This is a typical two-layer ANN model with an output layer and one hidden layer.
The error between target vector and predicted outputs from ANN model are used to validate the training performance. The appropriate error function is the mean square error (MSE) defined using the differences between the output vector y i and the target vector t i as The training error is used to update the ANN parameters of weights and bias of each neurons in the hidden and output layers. The training approaches of ANN include general regression, backpropagation (BP), radial basis function and fuzzy inference system. In this paper, the BP learning algorithm is adopted to a typical two-layer ANN model. The BP algorithm is a supervised iterative training method based on searching the global minimum in the difference between ANN output and target [47] . The errors in the output are propagated back by calculating the derivatives that indicate the amount of responsibility of each neuron using the gradient descent method [46] ( where (1) and (2) indicate the responsibilities of output-layer neurons and hidden-layer neurons, respectively. Then the weights and bias of links can be updated based on the responsibilities [48,49] as (2) = (2) + (1) (8) where is the learning rate of the BP neural network. Apart from the methods and algorithms of machine learning in datadriven models, the data selection with different features and sizes used for training is also a vital factor of the model performance [20] .

Data collection from building energy simulator
The development of a data-driven model will normally consist of data collection and processing as well as model training and testing. The training process of ANN requires a group of datasets from historical data records, which is used as benchmarks to train and test the models performance. The number of inputs neurons depends on the number of input variables for training the ANN. In the original data, the input data has 26 different variables include both the weather conditions and building characteristics as shown in Table 1 . The number of hidden neurons is chosen based on the complexity of the target model. The more hidden neurons used, the more complex the model can be achieved, but the longer training time will take. In this paper, the number of hidden neurons is manually chosen considering both the model complexity and training time. The ANN model is structured with 26 input neurons for ANN training corresponding to the input variables and 20 hidden neurons for internal relation of the ANN model. The data of input variables used for training includes the corresponding hour of the day and months of the year, selected environmental variables, building characteristics and heat demand of all target buildings. The data of building heat demand is used to calculate the training error of the ANN model output.
Ideally, the most direct way is to measure the end-user heat demand of each building. However, there are many difficulties in measuring the real heat demand of the buildings. Firstly, it is required to install great number of sensors in every parts of the buildings. However, as the case study in University of Glasgow (UofG), many buildings are hundreds of years old. These buildings do not have building management system (BMS) as modern buildings and it is difficult to reconstruct them. In addition, the collection of the required historical data will take at least several months but ideally several years in recording data from sensors.
Another way is to measure the heat demand from the supplier. In UofG, the buildings are heated via district heating network supplied by the energy centre. But the energy centre can only record the real heat demand of the whole campus but not each building. It makes difficulties in calculating the heat demand of each building from the sum of heat demand of the whole campus. Therefore, the bottom-up building models are built in the IES-VE software referring to all real buildings in campus. The building parameters are then calibrated to allow the sum of heat demand of all building models matching the real heat demand data of whole campus recorded by the energy centre. IES-VE is an integrated system to build bottom-up models of buildings for thermal analysis and heating load simulation using the Apache engine. After setting the latitude and longitude of the target buildings, the weather profile for the building energy simulation is obtained from the weather station at the nearest airport. For this study it is Glasgow airport which is 9 km from the University of Glasgow. The settings are to make sure the simulated result is as close as possible to the measured heat demand. After the model is well calibrated by the data of whole campus, the simulated heat demand of each building can be obtained. The calibration work and validation result against measured data have been published in previous publications [50,51] . Then the heat demand of each building simulated from this model is trusted to be close to their real heat demand. In this paper, to simplify the data collection process, the training data is taken from simulated results generated in the building energy simulation software IES-VE. The data includes the weather profile, building characteristics and simulated heat demand is used for ANN training.
In addition to the weather profile, different building characteristics will also affect the thermal behaviour of the building. The floor area, building volume, windows area and type, and wall thickness and material are the key characteristics of a building. Apart from these, the building heat demand also depends on its type of operational function. For example, the normal working hours of an office building are from 8:00 to 18:00, and its heat demand is different from that of a restaurant opening from 12:00 to 22:00 or a library open for 24 h. The data is collected from the sample building models built in IES-VE which were calibrated to fit the recorded data of real heat energy consumption [50] . The collected data is normalised before it is used to train the ANN model for the prediction of heat demand of the district heating network.

Sensitivity analysis of input variables to heat demand
The collected data of weather information and building characteristics are defined as input variables to the ANN model. To validate the effect of model variables, the sensitivity analysis has been designed to provide the elementary effect of each variable. The increase or decrease of each variable will then provide a clear effect on the output. The input and output data used in sensitivity analysis should be exactly the same with that used for ANN training. In the original data, the inputs of sensitivity analysis are the 26 variables including both the environmental condition and building characteristics, while the outputs are the heat demand of sample buildings randomly chosen from the campus.
The commonly used global sensitivity analysis method is the Morris method, which has been used to test the elementary effect in the case. The basic idea of the Morris method is to evaluate the response of the model output on the basis of a small change in a single input variable. The mean elementary effect of a single variable from the complete set of data points of size U is presented as However, in real engineering applications, the input variables are normally varied simultaneously with time and it is difficult to find the effect of a small change of only a single variable on the target variable. Some research in analysing ecology data has realised the potential weakness of perturbing a single factor at a time [30] . The input x k with n input variables and output = ( 1 , … , ) at time step ( q ) comparing with that at time step ( − 1) can be presented as Combined with Eq. (11) with = 1 , the output of Eq. (13) can be represented by the elementary effect k as In the standard Morris method, the mean value is calculated to average the elementary effect of a single variable in fitting the data. To calculate the elementary effect of multiple variables, the problem is changed to finding a pair of k to fit the relationship between changes on each variable to outputs as Normally, the simplest way is to calculate the inverse matrix of Δ ( ) and its product with the output. However, the inverse matrix can only be calculated if the original matrix is square and non-singular. This is difficult and not always satisfied in data regression problems. The common approach for the linear equation is to find the least-square solution to minimise the unfitted error referring to all used data points for sensitivity analysis. Then the most suitable * can be worked out to solve (15) .

Determine the correlation of input variables
To find the correlation between input variables and outputs, the most common method is regression. Linear regression is a powerful tool that uses mathematical manipulations to transform the relationship between dependent and independent variables into a linear form. Based on this, many procedures were developed to derive the equation of a straight line using the least-squares criterion for calibration. However, most engineering data is poorly represented by a straight line.
An alternative calibration is to fit polynomials to the data using polynomial regression, where the simplest is quadratic regression [52] . The quadratic regression ensures that the first-order derivative is continuous. The least-squares procedure can easily be extended to fit the data to a 2nd-order polynomial as the quadratic least square regression (QLSR) approach. The QLSR has the advantage to integrate both the convergence property of least squares and the probabilistic property of fuzzy regression to fit a non-linear mapping [53][54][55] .
Define the quadratic polynomial equation presented by the input variables as Assume the fitted value of the k th inputs as x k can be presented as the sum of polynomial equations of the other − 1 input variables where is the unique information that is independent with any other input variables. Then, the error between the real and fitted value of each input variable can be represented by To calibrate the value of each parameter in the polynomial equation, the estimated value can be obtained by fitting one variable at a time. The effect of other variables can be added to the error. The accumulated square error of U data points can be presented as To minimise the accumulated square error by adjusting the parameters, the partial derivative of the accumulated square error with respect to each parameter can be presented as After defining the partial derivative as zero, the most suitable parameters * 0 , * 1 , and * 2 can be obtained for the quadratic least square regression. After that, the coefficient of determination, R 2 shows the quality of the fit of each variable to the target input as Based on the coefficient of determination, the variable with the highest R 2 value can be chosen to fit the target variable. Thus, one key variable with its optimised parameters can be chosen in each iteration to fit the target input variable. The remaining difference between the target input variable and its fitted value is used in the next iteration. The fitting process using the QLSR is repeated until the R 2 value is lower than the threshold. The low R 2 value indicates that there has not been enough evidence to show that the remainder is determined by other variables. Then the assumed situation in (17) has been achieved and the process can be stopped. The lower threshold will cause more iterations in fitting the target variable. However, as the error reduced, the excessive iteration times does not give obvious difference in finding the most related features. Thus, considering both the performance and computational load, the threshold is set at 50% in the following case study to have acceptable fitting result. It validates that the correlation among input variables has been found as the target feature can be presented by other input variables. The result can then be used to remove the duplicated features from input variables to reduce the training data.

Input variables and sensitivity analysis
The data used for ANN training is collected from IES-VE software database containing building characteristics, energy consumption including heat demand, and weather profile, which is recorded by the nearest weather station in an airport after setting the location. In total, the weather profiles include 16 different hourly recorded variables, as given in Table 1 . A number of input variables are shown in Fig. 2 for a period of 30 days. Fig. 2 (a) shows the five types of temperature information: dry-bulb, wet-bulb, dew-point, daily mean and maximum adaptive temperature. The figures shows that the five temperature variables have similar tendency, i.e. the different temperatures depend on the same weather information. Similar results can also be found in direct radiation, global radiation and diffuse radiation in Fig. 2 (b). They have similar tendency under most circumstance. Fig. 2 (c) shows other useful input weather variables including solar altitude, relative humidity and external moisture content for every hour. They have no obvious dependency on other variables but possibly have hidden and non-linear relationships, which will be discussed later. The ANN training aims to use these weatherbased variables along with building characteristics as inputs and corresponding hourly heat demand as outputs.
Due to the multiplicity of the input variables, ANN training could take an extremely long time, furthermore, it reduces the efficiency in finding an effective input-output relationship in the ANN model. The sensitivity analysis (SA) introduced in Section 2.4 aims at analysing the sensitivity of each input variable to the output. The result of SA of input variables is shown in Fig. 3 . From the result, the wet-bulb temperature has the highest sensitivity to the building heat demand. The input variables include both the weather-based variables and the building characteristics. From the SA result, the top 5 highly sensitive variables are all temperature related. And the building characteristics with the highest sensitivity for heat demand prediction is the building volume. The wind direction is the variable with the lowest sensitivity to heat demand. With the result of sensitivity analysis, the number of weather input variables can be reduced via choosing the inputs with the highest sensitivity for training.

Analysis of correlation among input variables
In the sensitivity analysis, the five input variables with the highest influence on the output heat demand are wet-bulb, dry-bulb, dew-point, daily mean and maximum adaptive temperatures. Even though the five variables represent different types of temperature, they contain key information that affects all temperature-based variables. Thus, if the internal relationship between different input variables can be found, the number of input variables can be further reduced.
As the wet-bulb temperature has the highest influence in the sensitivity analysis to the output heat demand, it is tested and used as an example to show the fitting result. The QLSR method proposed in Section 2.3 is used to find the duplicated features in the target input variable. The fitting result in Fig. 4 show the wet-bulb temperature on the y -axis and the value of the correlated variables on the x -axis. Fig. 4 shows the fitting result in the first iteration. From which, it shows the relationship between the wet-bulb temperature with the other fifteen weather-based variables. To make it clearer in comparison, the coefficient of determination, which is also called R-squared ( R 2 ), is used to determine the dependency between variables, as shown in Fig. 5 (a).
The variable with the highest R 2 value is the dry-bulb temperature which has around 97% determination with wet-bulb temperature. After subtracting the determined part of dry-bulb temperature from the wet-bulb temperature, the remaining part is used as the next target to run the second iteration in QLSR. As shown in Fig. 5 (b)-(d), the next determination variables are the relative humidity and dew-point temperature. In the 4th iteration, the R 2 value has dropped below 35% and the error is less than 0.1% of the nominal temperature range. Then the iteration stops.
The fitting result from QLSR shows that the wet-bulb temperature depends on three other features, the dry-bulb temperature, relative humidity and dew-point temperature, whose parameters of weights also provided from the QLSR fitting. After that, the wet-bulb temperature    can be fitted by other three and the fitting weights based on equation (16) . The fitted wet-bulb temperature is compared with its real value as demostrated in Fig. 6 . In the figure, the solid black line indicates the real wet-bulb temperature and the read dots indicate the fitted one that is calculated by other features. The result shows that the average fitting error, which is given in the bottom figure with the blue dots, is less than 0.5%. It also means that the three fitted variables have more than 99.5% information of wet-bulb temperature. Therefore, it verifies that the wet-bulb temperature can be fully described by the other three features.
If the wet-bulb temperature and other three features are all chosen as the input variables for the ANN training, the output heat demand will be found related to both the wet-bulb temperature and the other three feature. It causes the repeated training to the duplicated features. Thus, in order to improve the effectiveness of ANN training, the duplicated feature of wet-bulb temperature can be removed from the training inputs and the number of variables are then reduced for a faster training speed.
Similar with the QLSR approach used on the wet-bulb temperature fitting, other variables are also tested for duplicated features before they are used as inputs of ANN training. In the traditional sensitivity analysis result given in Fig. 3 , the cut-off criteria is set as 0.02 to allow 15 variables out of 26 used for the ANN training. After running the same approach on the 15 variables, the result indicates that 3 features out of the 15 can be further removed to reduce the training load of ANN. They are the wet-bulb temperature, the maximum adaptive temperature and the moisture content. Using the QLSR approach, they are found containing duplicated features with other variables and thus can be removed from the training inputs. All other 12 variables, which are known as the result of sensitivity analysis with reduced features, are chosen for the ANN training.
However, it has to say that the number of variables with duplicated features depends on collected data and different case studies. In other cases that using different data sources, the removed variables could be found no longer containing duplicated feature with others. Thus, it is necessary to run the proposed sensitivity analysis with reduced features approach for each data source and case study.

ANN training and prediction performance comparison
To verify the effectiveness of the proposed approach, this section gives the comparison among the ANN training result of using all 26 input variables (All), the top 15 input variables from sensitivity analysis (SA), and 12 input variables chosen from the sensitivity analysis with reduced features (SARF).
The ANN is built and trained in Matlab with the built-in neural network toolbox. The parameter of ANN models is set using 1 hidden layer with 20 hidden nodes. The models are from the building models from the University of Glasgow, which has 36 different buildings with different heating types, and the total recorded data of 12 months weather profile. The training of ANN uses part of the recorded data includes weather profile of 4 months from January to April, building characteristics of 10 randomly chosen sample buildings and their heat demand. As the weather profile and heat demand of the first 4 months and 10 sample buildings are used to train the ANN, the remained 8 months weather profile is used to test the training performance by predicting the heat demand of the remained 26 buildings.
Furthermore, the ANN training is using random initial weights and bias value. In the ANN training, the stopping criteria is that the gradient reduction of training error using the initial parameters is low enough, which means the training error is difficult to be further reduced. The training performance is mostly dependent on the initial parameters of weights and bias. Therefore, only one set of training result is not enough to show and compare the performance for randomly chosen initial weights. The simulation test uses the MATLAB neural network toolbox for ANN model training. In order to make a fair comparison and to give a convincing conclusion, the ANN training and heat demand prediction should be repeated many times so that the statistic results can be compared to show the average training performance. In the case study, each method for ANN training has been repeated one thousand times and the statistical results of all methods are compared. The probability density functions of statistical prediction error of heat demand and training time of ANN model for the one thousand repeated tests are shown in Fig. 7 .
Due to the randomness of the ANN training process, the results show some variability and, thus, the medians of the statistical results are given   and compared. In the result of Fig. 7 (a), the median of prediction error using the original method is about approximate 60% prediction error while that of SA and SARF are around 38% prediction error. And in Fig. 7 (b), it is clear that the median of ANN training time with all variables is about 43 s, the median of training time of SA is 26 s and that of SARF is 21 s, around 20% less training time than SA and 50% less than original all inputs. In addition to the median of prediction error and training time, the variance of SARF is obviously smaller than the variance of the case with all input variables and similar to the variance of SA. This verifies that SARF will reduce the uncertainty in ANN training and training time while achieving better performance.
In addition, the result in Fig. 7 shows that the ANN training performance using the Levenberg-Marquardt (LM) training function, which is the most widely used training algorithm. However, there are many other training algorithms, including BFGS Quasi-Newton (BFG), resilient backpropagation (RP), scaled conjugate gradient (SCG), conjugate gradient with Powell/Beale restarts (CGB), Fletcher-Powell conjugate gradient (CGF), Polak-Ribi Ȩ re conjugate gradient (CGP), one step secant (OSS), and variable learning rate backpropagation (GDX). The next step is to verify the performance of the developed SARP method in other training algorithms. Tables 2 -4 show the prediction error of all training functions using the original 26 input variables, 15 input variables  from SA, and 12 input variables from SARF. The indices for performance comparison are chosen as the mean, min and max prediction error its standard deviation (STD) as well as the mean, min and max prediction time and its STD.
To make the comparison clearer, the mean prediction error, error STD, mean training time, and time STD are drawn in bar charts as shown in Fig. 8 . The results show the prediction error and time of different training functions in ANN model training. In comparison among different training functions, the BFG, SCG, CGB, CGG, and CGP have the relatively better prediction result while the RP and GDX require the least training time. For all the training function used in the above ANN training, the SARF uses the least training time to perform a relatively better performance with the least prediction error. It verifies that the proposed SARF method can improve the ANN training efficiency and reduce the training time to obtain the same or even better performance in the heat demand prediction.

Conclusion
With the development of ANN technology, the sensitivity analysis is necessary to rank the importance of input variables due to a large amount of training data. In predicting the district heat demand using weather information, it is found that most input variables contain duplicated features which is not required to train the ANN model. This paper proposed a method with the ability to remove both the variables with low importance and the variables that have high importance but contain duplicated features. The proposed approach analysed the correlation among input variables via detecting the coefficient of determination of each variable with others referring to the fitting error of quadratic least square regression. The approach is validated in a case study of predicting heat demand in a district using an ANN model that is trained by historic data from several sample buildings. The traditional sensitivity analysis method ranked the input variables based on their influence on the heat demand. It was shown that the 15 most important features can be used to predict the district heat demand with the same or even better performance than the complete set of features. The proposed method further removed 3 important variables that are determined by other variables via analysing the determination of each variable. The results show that the proposed method can reduce training time by around 20% while achieving the same training and prediction performance compared with the traditional sensitivity analysis method. With the developed sensitivity and correlation analysis approach, the training data is simplified and the efficiency of training an ANN model can then be improved.

Declaration of Competing Interest
The authors declare no conflict of interest. The sponsors had no role in the design, execution, interpretation, or writing of the study.