Data Assessment on the relationship between typical weather data and electricity consumption of academic building in Melaka

Exposure to hot and humid weather conditions will often lead to consuming a vast amount of electricity for cooling. Heating, ventilation, and air conditioning (HVAC) systems are customarily known as the largest consumers of energy in institutions and other facilities which raises the question regarding the impact of the weather conditions to the amount energy consumed. The academic building is a perfect example where a constant fixed daily operating characteristic is measured by the hour, aside from the occasional semester break. Therefore, it can be assumed that the daily HVAC services on an academic facility will operate on a fixed schedule each day, having a similar pattern all year round. This article aims to present an analysis on the relationship between typical weather data by implying the test reference year (TRY) and academic building electricity consumption in an academic building located at Durian Tunggal, Melaka. Typical weather data were generated in representing the weather data between 2010 and 2018 using the Finkelstein–Schafer statistic (F-S statistic) in addition to a data set of electricity consumption. Descriptive analysis and correlation matrix analysis were conducted using JASP software for two sets of sample data; Set A and Set B, with data points of 12 and 108, respectively. The result showed an alternate result with a positive correlation between 1)mean temperature-electricity consumption, and 2)mean rainfall-electricity consumption for data Set A, and a negative correlation between 1)mean temperature-electricity consumption and 2)mean rainfall-electricity consumption for data Set B.


a b s t r a c t
Exposure to hot and humid weather conditions will often lead to consuming a vast amount of electricity for cooling. Heating, ventilation, and air conditioning (HVAC) systems are customarily known as the largest consumers of energy in institutions and other facilities which raises the question regarding the impact of the weather conditions to the amount energy consumed. The academic building is a perfect example where a constant fixed daily operating characteristic is measured by the hour, aside from the occasional semester break. Therefore, it can be assumed that the daily HVAC services on an academic facility will operate on a fixed schedule each day, having a similar pattern all year round. This article aims to present an analysis on the relationship between typical weather data by implying the test reference year (TRY) and academic building electricity consumption in an academic building located at Durian Tunggal, Melaka. Typical weather data were generated in representing the weather data between 2010 and 2018 using the Finkelstein-Schafer statistic (F-S statistic) in addition to a data set of electricity consumption. Descriptive analysis and correlation matrix analysis were conducted using JASP software for two sets of sample data; Set A and Set B, with data points of 12 and 108, respectively. The result showed an alternate result with a positive correlation between 1)mean temperature-electricity consumption, and 2)mean rainfall-electricity consumption for data Set A, and a negative correlation between 1)mean temperature-electricity consumption and 2)mean rainfall-electricity consumption for data Set B.  a. Graph of the long-term cumulative distribution function (CDF) and short-term CDF for the three-weather parameters (mean temperature, mean relative humidity and mean rainfall) and electricity consumption between 2010-2018. b. Graph of the long-term CDF, best CDF, and worst CDF for the three-weather parameters (mean temperature, mean relative humidity and mean rainfall) and electricity consumption between 2010-2018. c. Flow of an automatic weather system (AWS) consisting of the acquisition electronic and AWS interface. d. Autonomous tipping bucket inside -top view and side view. e. Autonomous thermometer screen inside view and side view.
How data were acquired a. Weather data were retrieved from the Meteorology Department of Malaysia based on daily weather data captured by an automatic weather station located at Batu Berendam, Melaka. The weather station was equipped with an autonomous thermometer screen and an autonomous tipping bucket that collected information on the temperature, relative humidity, and rainfall. b. Electricity consumption for the main university campus was estimated using the utility bill provided by the utility company-Tenaga Nasional Berhad (TNB). c. JASP software for statistical analysis was employed for further data treatment in addition to using Microsoft Excel.
Data format Raw Data: a. Weather data (daily mean temperature, mean relative humidity and rainfall). b. Monthly Electricity consumption. Analysis: a. TRY weather data were generated using the Finkelstein-Schafer statistic method. b. The CDF and correlation matrix between TRY weather data and electricity consumption were determined.

Filtered:
The outlier detected on the rainfall reading (-33.3) represented a trace reading but less than 0.2 mm.
( continued on next page )

Parameters for data collection
The weather data and electricity consumption (daily and monthly) for analysis represented a nine-year period (2010-2018).
Description of data collection a. For the weather data, the parameters analysed included the daily mean temperature, mean relative humidity and rainfall. The temperature and rainfall readings were safely logged into the AWS system. b. Electricity consumption was measured each month via utility companies before issuing to the management of the university.

Value of the Data
• The data would provide a real weather variable pattern representing the Melaka city area specifically for the location close to Batu Berendam, Melaka. • The data would provide a threshold for TRY weather data correlation and energy consumption for academic facilities. • The data would provide the researcher with an opportunity to replicate the statistical analysis of this study, as a reliable source, in progressing research to investigate the impact of weather variables on the energy performance of a facility.

Data Description
The weather variables of this research included the mean temperature ( °C), mean relative humidity (%) and mean rainfall (mm) at 08-08 Mountain Standard Time (MST) based onthe data provided by the Meteorology Department of Malaysia for the Batu Berendam, Melaka weather station. The daily raw data between 1 st January 2010 and 31 st December 2018 were retrieved from the weather station, equipped with an autonomous measuring instrument having recorded hourly readings through the Automatic Weather System (AWS). Based on the raw data the TRY representing weather data over the nine years was generated, in describing typical weather data of a specific location using the Finkelstein-Schafer statistic method (F-S statistic), referred to as the supplementary file for the Melaka 2010-2018 test reference year.
In contrast, the estimation of monthly electricity consumption data on selected academic facilities was based on the utility bill issued each month by the national utility company, Tenaga Nasional Berhad (TNB) referred to as the supplementary file TRY weather data vs electricity. The cumulative distribution function (CDF) was later calculated for both the weather data set and electricity consumption data set to compute their respective FS statistic. The supplementary file under the name graph best and worst weather data and graph best and worst electricity consumption displayed the graph for monthly best and worst weather data parameters and electricity consumption, respectively.
Figs. 1 , 3 and 5 show the graph of the long-term CDF and short-term CDF of the sample month of January for the mean temperature, mean relative humidity and mean rainfall, respectively. While Figs. 2 , 4 and 6 show the graph of the long-term CDF and their respective best and worst CDF for the mean temperature, mean relative humidity and mean rainfall, respectively. For instance, Fig. 1 signifies the solid straight-line posterior distribution of long-term CDF for the temperature variables which were then used for the decision of best (January 2016) and worst (January 2011) temperature variable CDF as represented in Fig. 2 In other words, the 'best CDF' means that the data distribution pattern is the best representative of all candidates for that specific month (short-term) since it is almost imitating the distribution pattern of the aforementioned long-term CDF.
Contrastingly, Tables 1-3 portray the FS value for the mean temperature ( Table 1 ), mean relative humidity ( Table 2 ) and mean rainfall ( Table 3 )  The monthly electricity consumption data were treated in the same manner as that of CDF presented in Figs. 7 , 8 . In Fig. 7 , the graph shows the compilation of the long-term CDF and the Fig. 1. Graph of the long-term CDF and short-term CDF (January) for the mean temperature.  short-term CDF of individual years. However, Fig. 8 demonstrates the best and the worst of the electricity consumption parameter. The best and the worst year of the data set were determined in accordance with the FS statistic used in generating the typical weather data. Nevertheless, the best year is 2010, and the worst year is 2016 accordingly in comparison to the long-term year as a threshold.
On the other hand, the open-source statistical analysis software platform, namely JASP software, was used to perform the descriptive statistical analysis and correlation matrix analysis. Tables 4 , 5 show the descriptive statistic presented into two sets of data, Set A and Set B. Set A of the data sample consists of 12 data points while Set B consists of 108 data points. Here, the number of data points that differ in both sets of the sampled data is where Set A represents typical weather data for the respective month of the year while Set B represents typical weather data for the respective day of a month for over the nine years.       However, it can be seen that there is a reasonable similarity between the descriptive analysis for the correlation analysis in Set A and Set B. This is because the sample of electricity consumption utilised was the best representative that was selected from the nine years using the method described in the previous section. It is the best distribution that mimics the whole pattern distribution of electricity consumption. A similar explanation is given for the typical weather data or the TRY used for the correlation analysis for both sets. Concerning data skewness, it can be inferred that the tendency of the mean temperature, mean relative humidity and electricity consumption is prone to the left tail. In other words, the distribution is negatively skewed. However, the mean rainfall parameter distribution is positively skewed, having a bulk amount of data on the right tail.
Furthermore, the result from the correlation analysis, Set A and Set B, is presented in Tables 6 , 7 , respectively. For Set A ( Table 6 ), the negative correlation can be observed between the 1)mean temperature-the mean relative humidity and 2)mean relative humidity-mean rainfall  for all three-correlation tests carried out. On the other hand, a positive correlation is observed between the 1)mean temperature-mean rainfall, 2)mean temperature-electricity consumption and 3)mean rainfall-electricity consumption for all three-correlation measures. As for the correlation between the mean relative humidity and electricity consumption, the negative correlation resulted from Pearson's r test and the positive correlation quotient from Spearman's rho and Kendall's tau. Likewise, the correlation analysis for Set B ( Table 7 ) shows almost a similar outcome as for Set A, except that in this case, a negative correlation was observed for all three-correlation measures for the relationship between 1)mean temperature-electricity consumption and 2)mean rainfall-electricity consumption.
Finally, the Bayesian pair correlation for Bayes Factors, BF 10 confidence interval under the alternative hypothesis was carried out for both data sets in which focused on Pearson's r correlation analysis and the scatter plot, which was projected based on that established idea. The alternative hypothesis suggests that there is a positive association between the two variables tested. Tables 8 , 9 show a scatter plot for Set A and Set B Pearson's r correlation analysis. Moreover, it visualises the distribution that constitutes the entire Pearson's r correlation coefficient with the assumption that it must be a continuous type of data with a linear relationship between the two assigned variables and no outliers whatsoever. Therefore, the causal correlation  Table 8 Scatter plot for the correlation analysis Set A.

Table 9
Scatter plot for the correlation analysis Set B.
between the parameters, for instance, a)temperature and relative humidity, b)temperature and rainfall, c)temperature and electricity consumption, d)relative humidity, e)relative humidity and electricity consumption and f)rainfall and electricity consumption was projected for both Set A and Set B. The concept of the posterior-prior plot for this Bayesian pair correlation is that if the unconnected dot before distribution is higher than the straight-line posterior distribution, thus the BF 10 is in favour of the alternative hypothesis (positive correlation) and vice versa. Referring to the unconnected dot and the straight-line of the posterior-prior plot in Table 8 it shows that (a) temperature is negatively correlated to relative humidity, (b) temperature is positively correlated to rainfall, (c) temperature is positively correlated to electricity consumption, (d) relative humidity is negatively correlated to rainfall, (e) relative humidity is negatively correlated to electricity consumption and (f) rainfall is positively correlated to electricity consumption. In contrast, the posterior-prior plot in Table 9 , shows that (a) temperature is negatively correlated to relative humidity, (b) temperature is positively correlated to rainfall, (c) temperature is negatively correlated to electricity consumption, (d) relative humidity is negatively correlated to rainfall, (e) relative humidity is positively correlated to electricity consumption and (f) rainfall is negatively correlated to electricity consumption.
Based on both correlation analysis above (Pearson's r, Spearman's rho, Kendall's tau, and Bayesian pairs correlation) it can be safely deduced that for data Set A, a promising medium positive correlation (0.312-Spearman's rho and 0.287-Pearson's r) was detected between the mean temperature and electricity consumption, whereas for data Set B, a weak positive correlation (0.044-Spearman's rho and 0.036-Kendall's tau) was confirmed between mean relative humidity and electricity consumption.

Experimental Design, Materials and Methods
The weather data to be analysed in this experiment consisted of various parameters, namely 24 hours mean temperature ( °C), 24 hours mean relative humidity (%) and 08-08 MST rainfall (mm). All three parameters were retrieved from one of the main automatic weather stations at Batu Berendam, Melaka (2 °16'N, 102 °15'E) by the Meteorology Department of Malaysia. The Batu Berendam's weather station is the nearest main station which is approximately 10.90 km from the main university campus. The reliability of the weather data was based on the assumption that even if the weather condition were to change every 5 km, it would not make a significant difference regarding the climate condition since it involves only a two-radius difference. Thus, the weather information from this station represented the weather condition at the main university campus.
Moreover, the method of weather data acquisition comprised of both hourly logged AWS and 08-08 MST (mm) via a manually acquired weather measuring instrument which concurrently functions as a support for data validation in case one of the instrument categories failed. The main station was completed having an autonomous measuring instrument to extract multiple meteorology parameters such as the Tipping bucket (rainfall), PT 100 sensor (wet bulb and dry bulb temperature), Anemometer (wind direction and speed) Solarimeter (solar irradiance) and high-volume air sample (air pollutant index). In addition, it was equipped with manual meteorology devices such as the Rain gauge (rainfall), Thermometer screen (maximum and minimum temperature, dry and wet bulb temperature) and Evaporation pond (rate of natural evaporation). In the end, all logged readings from the autonomous instrument were channelled to acquisition electronics which were monitored on the AWS interface. All the meteorology measuring instruments were assigned and set up based on the guidelines provided by the World Meteorology Organisation (WMO). For instance, the meteorology instrument placed at the standardised height and distance between one measuring instruments to the other was also uniquely considered to prevent any meteorology data interference.
Aside from that, all meteorology instruments were consistently scheduled to be calibrated twice yearly. While the focus of this exercise was only on the three parameters mentioned earlier, it is undeniable that all meteorology parameters gathered by the weather station would be associated at some level to establish a specific climate condition. Having said that, it was considered safe to make a premature inference that any huge fluctuation from any parameter would contribute to a significant correlation, either positive or negative. Figs. 9 -11 show the AWS available at the Batu Berendam's weather station, the autonomous tipping bucket, and the autonomous thermometer screen, respectively.
Utilising the typical weather data via TRY is implied as a reliable source, chosen to represent the pattern of description for the weather environment at a specific location [1][2][3][4] . In this article, Finkelstein-Schafer statistic (F-S statistic) was utilised to generate TRY. First and  foremost, the CDF for the individual month and the long-term CDF was calculated. After that, using Equation (1) the difference between the long-term CDF and the individual month CDF of interest was calculated. Lastly, the smallest value of the FS statistic of each individual month was selected to form a TRY. The TRY formed was then utilised for further statistical analysis, as mentioned previously [3,5] .
Using the open-source statistical software proved to be convenient for the ease of data handling and data treatment. Thus, the JASP software for statistical analysis was employed to perform the descriptive analysis and the correlation analysis. The descriptive analysis included all crucial information such as the mean, standard deviation, skewness, kurtosis, maximum and minimum value [6] . Furthermore, for the correlation analysis, the set of correlation measures such as Pearson's r, Spearman's rho and Kendall's tau was undertaken with the support of Bayesian correlation pair analysis consecutively. Two sets of analysis were conducted, the first set (Correlation analysis Set A) involved was n = 12 which is between the TRY representing