Interpretable Modeling for Short- and Medium-Term Electricity Demand Forecasting

We consider the problem of short- and medium-term electricity demand forecasting by using past demand and daily weather forecast information. Conventionally, many researchers have directly applied regression analysis. However, interpreting the effect of weather on the demand is difficult with the existing methods. In this study, we build a statistical model that resolves this interpretation issue. A varying coefficient model with basis expansion is used to capture the nonlinear structure of the weather effect. This approach results in an interpretable model when the regression coefficients are nonnegative. To estimate the nonnegative regression coefficients, we employ nonnegative least squares. Three real data analyses show the practicality of our proposed statistical modeling. Two of them demonstrate good forecast accuracy and interpretability of our proposed method. In the third example, we investigate the effect of COVID-19 on electricity demand. The interpretation would help make strategies for energy-saving interventions and demand response.


INTRODUCTION
Short-and medium-term demand forecasting with high accuracy is essential for decision making during the trade on electricity markets and operation of power systems. Conventionally, several researchers have used previous demand, weather, and other factors as exploratory variables (e.g., Lusis et al., 2017) and directly applied regression analyses to forecast demand. As methodologies for regression analysis, linear regression (Amral et al., 2008;Dudek, 2016;Saber and Alam, 2017) and smoothing spline (Engle et al., 1986;Harvey and Koopman, 1993) have been traditionally used. Recently, several studies have applied functional data analysis, where the daily curves of electricity demand are expressed as functions (Cabrera and Schulz, 2017;Vilar et al., 2018). It should be noted that most statistical approaches are based on probabilistic forecasts, and the distribution of forecast values is helpful for risk management (Cabrera and Schulz, 2017). On the other hand, machine learning techniques have attracted attention in recent years, such as support vector machine (SVM); Jiang et al., 2018;Yang et al., 2019); neural networks (He, 2017;Guo et al., 2018b;Kong et al., 2018;Bedi and Toshniwal, 2019;Wang et al., 2019), gradient boosting , and hybrids of multiple forecasting techniques (Miswan et al., 2016;Liu et al., 2017;de Oliveira and Cyrino Oliveira, 2018;Haq and Ni, 2019). These techniques capture complex nonlinear structures; therefore, high forecast accuracies are expected.
In practice, the time intervals are different among exploratory variables. For example, assume that the demand is forecasted for a single day that will occur several days in the future, at 30-min intervals; this is a common scenario for market transactions in electricity exchanges (e.g., the day-ahead market in the European Power Exchange, EPEX). In this example, the electricity demand would be collected at 30-min intervals using a smart meter, whereas weather forecast information, such as maximum temperature and average humidity, would typically be observed at intervals of one day. In this study, we use past demand at 30-min or 1-h intervals and daily information (e.g., maximum temperature) on the forecast day as exploratory variables. We note that our proposed model, which will be described in Section 2, is directly applicable to any time resolution of data, such as the demand in 1-min intervals and temperature forecast in 1-h intervals.
From a suppliers' point of view, it is crucial to produce an interpretable model to investigate the impact of weather on the demand. For example, estimating the fluctuations of electric power caused by weather would help develop strategies for energy-saving interventions [e.g., (Guo et al., 2018a;Wang et al., 2018)] and demand response [e.g., (Ruiz-Abellón et al., 2020)]. To produce an interpretable model, one can directly add weather forecast information to the exploratory variables in the regression model (Hong et al., 2010) and investigate the estimator of regression coefficients. More generally, techniques for interpreting any type of black-box model, including the deep neural networks, have been recently proposed, such as the Local Interpretable Model-agnostic Explanations (LIME; (Ribeiro et al., 2016)) and SHapley Additive exPlanations (SHAP; (Lundberg and Lee, 2017)). However, these methods are used for variable selection, i.e., a set of variables that plays an essential role in the forecast is selected. Variable selection cannot estimate the fluctuations in electric power caused by weather.
In contrast to variable selection, decomposition of the electricity demand at time t, say y t , into two parts is useful for interpretation: where μ t and b t are the effects of past demand and weather forecast information, respectively. Typically, we use demand at the same time interval of the previous days as exploratory variables [e.g., (Lusis et al., 2017)], and regression analysis is separately conducted on each time interval. We then construct estimators of μ t and b t , sayμ t andb t , respectively. The interpretation is carried out by plotting a curve ofb t . However, without elaborate construction and estimation of b t , we face two issues. The first issue is that the curve ofb t often becomes nonsmooth at any time interval (i.e., every 30 min) in our experience. The non-smooth daily curve of b t is unrealistic because it implies the impact of daily weather forecast on demand changes nonsmoothly every 30 min. This problem is caused by the fact that the regression analysis is separately conducted at each time interval. To address this issue, we should estimate parameters under the assumption that b t is smooth.
The second issue is related to the parameter estimation procedure. In many cases, the regression coefficients are estimated through the least squares method. However, in our experience, the estimate of regression coefficients related to b t can be negative, leading to a negative value ofb t . Sinceb t is the fluctuations of electric power caused by weather, the interpretation becomes unclear. To alleviate this problem, we need to restrict the regression coefficients associated with b t to nonnegative values.
In this study, we develop a statistical model that elaborately captures the nonlinear structure of daily weather information to address two challenges, as mentioned earlier. We employ the varying coefficient model (Hastie and Tibshirani, 1993;Fan and Zhang, 1999) with basis expansion, where the regression coefficients associated with weather are assumed to be different depending on the time intervals. The regression coefficients are expressed by a nonlinear smooth function with basis expansion, which allows us to generate a smooth function of b t . Furthermore, the weather effectb t is also expressed as a nonlinear smooth function. To generate nonnegative regression coefficients, we employ the nonnegative least squares (NNLS, e.g., Lawson and Hanson, 1995) estimation. NNLS estimates parameters under the constraint that all regression coefficients are nonnegative. With the NNLS estimation, the value ofb t is always nonnegative; thus, the interpretation becomes clear.
The usefulness of our proposed method is investigated through the application to three real datasets. The results for two of the datasets show that the NNLS can appropriately capture the fluctuations of electric power caused by weather. Furthermore, our proposed method yields better forecast accuracy than the existing machine learning techniques. In the third example, we investigate our proposed method's practical usage when COVID-19 influences the electrical demand (e.g., facility closure or recommendation of telework). Our proposed method is directly applicable in such a situation; in addition to the daily weather forecast, we use the average number of infections in the past several days as exploratory variables. The result shows that our proposed method can adequately capture the effect of COVID-19 and also improve forecast accuracy.
The remainder of this paper is organized as follows: Section 2 describes our proposed model based on the varying coefficient model. In Section 3, we present the parameter estimation via nonnegative least squares. Section 4 presents the analysis of data from Tokyo Electric Power Company Holdings. In Section 5, we investigate the impact of COVID-19 on electrical demand through the analysis of data from one selected research facility in Japan. Concluding remarks are given in Section 6. Some technical proofs and additional information of the data analysis are deferred to the Supplementary Material.

PROPOSED MODEL
Short-and medium-term forecasting is often used for trading electricity in the market. Among various electricity markets, the day-ahead (or spot) and the intraday markets are popular in electricity exchanges, including the European Power Exchange (EPEX) (https://www.epexspot.com/en/market-data/ dayaheadauction) and Japan Electric Power Exchange (JEPX) (http://www.jepx.org/english/index.html). In the day-ahead market, contracts for the delivery of electricity on the following day are made. In the intraday market, the power will be delivered several tens of minutes (e.g., 1 h in JEPX) after the order is closed. In both markets, transactions are typically made in 30-min intervals; thus, the suppliers must forecast the demand in 30-min intervals. In this study, we consider the problem of forecasting demand that can be applied to both day-ahead and intraday markets.
Let y ij be the electricity demand at jth time interval on ith date (i 1, . . . , n, j 1, . . . , J). Typically, J 48, because we usually forecast the demand in 30-min intervals. We consider the following model: where μ ij is the effect of past electricity demand, b ij is the effect of weather, such as temperature and humidity, and ε ij are error terms with E[ε ij ] 0. Typically, the error variances in the daytime are larger than those at midnight because of the uncertainty of human behavior in the daytime. Therefore, it would be reasonable to assume that V[ε ij ] σ 2 j , i.e., the error variances depend on the time interval. One may assume the correlation of errors for different time intervals, i.e., Cor(ε ij , ε ij′ ) ≠ 0 for some j ≠ j′; however, the number of parameters becomes large. For this reason, we consider only the case where the errors are uncorrelated. Note that the final implementation of our proposed procedure described later is independent of the assumption of the correlation structure in errors.
One can express b ij and μ ij as linear or nonlinear functions of predictors and conduct the linear regression analysis. With this procedure, however, we face two issues, as mentioned in the introduction; thus, we carefully construct appropriate functions of b ij and μ ij .

Expression of b ij
Weather forecast information is typically observed at intervals of one day and not 30 min (e.g., the maximum temperature or average humidity). For this reason, we assume that the weather forecast information does not depend on j. Let a vector of weather information be s i . We express b ij as a function of s i . Here, we assume two structures as follows: • It is well known that the relationship between weather variables and consumption is nonlinear. For example, the relationship between maximum temperature and consumption is approximated by a quadratic function (e.g., 30) because air conditioners are used on both hot and cold days. For this reason, it is assumed that b ij is expressed as some nonlinear function of s i . • Although s i does not depend on j, the effect of weather, b ij , may depend on j. For example, consumption in the daytime is affected by the maximum temperature more than that at midnight. In this case, the regression coefficients associated with s i change according to the time interval j. However, if we assume different parameters at each time interval, the number of parameters can be large, resulting in poor forecast accuracy. To decrease the number of parameters, we use the varying coefficient model, in which the coefficients are expressed as a smooth function of the time interval.
Under the above considerations, we propose expressing b ij as follows: where g m (s i ) (m 1, . . . , M) are basis functions given beforehand, β m (j) are functions of regression coefficients, and M is the number of basis functions. We also use the basis expansion for β m (j): where h q (j) (q 1, . . . , Q) are basis functions given beforehand and c qm are the elements of the coefficient matrix Γ (c qm ). Substituting (4) into (3) results in the following: Because h q (j) and g m (s i ) are known functions, the parameters concerning b ij are c qm .
Since the effect of weather is assumed to be smooth according to both j and s i , we use basis functions h q (j) and g m (s i ), which produce a smooth function, such as B-spline and the radial basis function (RBF). Our numerical study adopts the B-spline and cyclic B-spline functions as basis functions of g m (s i ) and h q (j), respectively. For detail, please refer to Section 4.1. The RBF could be used but it includes a tuning parameter that highly affects the forecast error, resulting in low forecast accuracy in our experience.

Expression of μ ij
Since μ ij is the effect of past consumption, one can assume that μ ij is expressed as a linear combination of past consumption where T and U are positive integers, which denote how far we trace back through the data and α jt (t 1, . . . , T) and β ju (u 1, . . . , U) are positive values given beforehand. The α jt correspond to the effects of past consumptions for the same time interval on previous days, while β jt are the coefficients for different time intervals on the same day. For the day-ahead market, we assume that β ju ≡ 0. Here, L α and L β are nonnegative integers that describe the lags; these values change according to the closing time of transactions. For example, transactions of the day-ahead market in the JEPX close at 17:00 on the previous day of the delivery of electricity. Thus, for instance, when the trade of the electricity demand is in the 17:30-18:00 interval on Sep 20th, we cannot use the electricity demand in the same time interval on Sep 19th due to the trading hours of the market, resulting in L α 1. The design of T and U depends on the aim of the analysis. For example, if the analyst should use only the latest demand, T and U are selected as (T, U) (1, 1). The usage of only one latest demand value could often yield unstable forecast value; in this case, larger T and U would provide better results. However, too large T and U can result in overfitting. Thus, we do not generally know the best T and U to achieve the best forecast accuracy. In our numerical experiments in Sections 4 and 5, all tuning parameters, including T and U, are selected by cross-validation. In our experience, the cross-validation generally provides good forecast accuracy.
In practice, however, it is assumed that past consumption also depends on past weather, such as daily temperature. For example, suppose that it was exceptionally hot yesterday and it is cooler today. In this case, it is not desirable to directly use past consumption as the predictor; it is better to remove the effect of past temperature from past consumption. In other words, we can use and y i(j−u−L β ) , respectively. As a result, μ ij is expressed as follows: Substituting (5) into (7) results in the following: The appropriate values of α jt and β ju are chosen by several approaches. A simple method is α jt 1/T and β ju 1/U, which implies μ ij is the sample mean of the past consumption. Another method is based on the AR(1) structure, i.e., α jt ρ t α and β ju ρ u β , where ρ α and ρ β satisfy T t 1 ρ t α 1 and U u 1 ρ u β 1, respectively. Note that T t 1 ρ t ρ(1 − ρ T )/(1 − ρ), so T t 1 ρ t 1 is equivalent to ρ T+1 − 2ρ + 1 0, whose numerical solution is easily obtained.

Proposed Model
By combining the expressions of b ij in Eq. 5 and μ ij in Eq. 8, the model Eq. 2 is expressed as follows: The model Eq. 9 is equivalent to the linear regression model where γ vec(Γ) and ε vec(E) with E (ε ij ). Here, X andỹ are considered as the design matrix and the response vector, respectively. The definitions ofỹ and X are given in Section S1 in the Supplementary Material.

Nonnegative Least Squares
To estimate the regression coefficient vector γ, one can use the least squares estimation (LSE) min γ ỹ − Xγ 2 2 .
In our experience, however, the elements of least squares estimateγ often become negative. In such cases, the estimate of b ij is negative because the basis functions h q (j) and g m (s i ) generally take positive values. When b ij < 0, one can interpret the weather effect is negative. Nevertheless, the one-day curve ofb ij turns out to be counterintuitive; the effect of weather negatively increases as the electricity demand increases. In other words, b ij is negatively large at working hours and small at midnight. As a result, μ ij becomes extremely large in the working time. We observe this phenomenon on the analysis of three datasets presented in this study; one of them is the well-known Global Energy Forecasting Competition 2014 (GEFCom 2014) data. Therefore, this phenomenon could occur in other datasets.
A clear interpretation is realized when the weather effect b ij is nonnegative. To achieve this, we employ the nonnegative least squares (NNLS) estimation, in which we minimize the loss function under a constraint the regression coefficients are nonnegative: The optimization problem Eq. 11 is a special case of quadratic programming with nonnegativity constraints [e.g., (Franc et al., 2005)]. As a result, the NNLS problem becomes a convex optimization problem. Several efficient algorithms to obtain the solution in Eq. 11 have been proposed in the literature [e.g., (Lawson and Hanson, 1995;Bro and De Jong, 1997;Timotheou, 2016)].
We add the ridge penalty (Hoerl and Kennard, 1970) to the loss function of the NNLS estimation: where λ > 0 is a regularization parameter. In our experience, the ridge penalization improves the forecast accuracy, and also produces a smoother function ofb ij than the unpenalized NNLS, which makes the interpretation of the weather effect easier.

Forecast
For the day-ahead forecast, we forecast the demand on the next day,ŷ (i+1)j , for the given NNLS estimateγ and weather information s i+1 . The forecast valueŷ (i+1)j is expressed as follows: On the intraday forecast, we may use information about the demand on that day so thatμ (i+1)j is expressed aŝ Construction of a forecast interval based on Eq. 11 or Eq. 12 is not easy due to the constraints of the parameter. To derive the forecast interval, we employ a two-stage procedure; first, we estimate the parameter via NNLS to extract variables that correspond to nonzero coefficients. Then, we employ the least squares estimation based on the variables selected in the first step.
With this procedure, we should derive the forecast interval after model selection. To achieve this result, the post-selection inference (Lee and Taylor, 2014;Lee et al., 2016) is employed. The post-selection inference for the NNLS estimation is detailed in Section S2 in the Supplementary Material.

APPLICATION TO DEMAND DATA FROM TOKYO ELECTRIC POWER COMPANY HOLDINGS
The performance of our proposed method is investigated through the analysis of electricity demand data collected from Tokyo Electric Power Company Holdings, available at https://www. tepco.co.jp/en/forecast/html/download-e.html. The dataset consists of electricity demand from April 1st, 2016, to March 30th, 2020. The demand is shown in MW at 1-h intervals (i.e., J 24).
We forecast the demand from April 1st, 2019 to March 30th, 2020 (data in 2016-2018 are only used for training). The training data consist of all demand data up to the previous day of the forecast day; for example, when we forecast the demand on February 4th, 2020, the training data are demand from April 1st, 2016, to February 3rd, 2020. For the sake of simplicity, we consider the problem of the day-ahead forecast, that is, β ju ≡ 0. Also, we set L α 0.

Basis Functions
We use maximum temperature in Tokyo as the weather variable s i , available at Japan Meteorological Agency (https://www.jma.go. jp/jma/indexe.html). Figure 1 shows the relationship between the maximum temperature and demand. We depict this relationship in different time intervals for Sunday, Monday, and Wednesday. The curves are depicted by fitting the ordinary least squares estimation with the cubic B-spline function (e.g., Hastie et al., 2008) with equally-spaced knots. The number of basis functions is M 5. The B-spline function of k-degree is constructed by linear combination of M basis functions with M + 2k knots, say t 1 , . . . , t M+2k . For given n data points in increasing order, say x 1 , . . . , x n , we set the knots as t 1 ≤ t 2 ≤ / ≤ t k x 1 ≤ / ≤ x n t M+k+1 ≤ / ≤ t M+2k . The B-spline function is the density function of a convolution of uniform random variables (Hastie et al., 2008), and it is constructed by the following recurrence relation (De Boor, 1972): where 1 A (x) is the indicator function, that is, 1 A (x) 1 when x ∈ A and 1 A (x) 0 otherwise. We adopt the cubic B-spline (k 3). The left panel of Figure 2 depicts the cubic B-spline function with M 10. For all settings, the curve-fitting works well, which implies the usage of the B-spline function as a basis function g(s i ) would be reasonable.
The curve shape for 9:00-10:00 is different from that for 15: 00-16:00, which suggests that the regression coefficients must be different among time intervals. Meanwhile, the curve shapes for 15:00-16:00 and 18:00-19:00 are similar. In this case, it is reasonable to assume that the regression coefficients for 15: 00-16:00 are similar to those for 18:00-19:00. The regression coefficients are assumed to change depending on the time interval yet to be smooth according to the time interval. To achieve a smooth function of β m (j), we also use the cubic B-spline as a basis function of h q (j). As the ordinary B-spline function cannot produce a smooth curve around the boundary (i.e., 23:00-0:00 and 0:00-1:00), we employ the cyclic B-spline function, where the basis functions wrap at the first and last knot locations. The cyclic B-spline function is implemented in the cSplineDes function in the mgcv package in R. The right panel of Figure 2 shows the cyclic B-spline function of degree 3 with M 10.
We also observe that the curve shapes are different among each day of the week. Therefore, we construct the statistical models by day of the week separately; seven statistical models are constructed. To forecast the demand, we select a model that matches the day of the week. All national holidays are regarded as Sunday; therefore, the number of observations on Sunday is larger than on other weekdays.

Candidates of Tuning Parameters
We employ our proposed method based on two estimation procedures: ridge estimation min γ ỹ − Xc 2 2 + λ c 2 2 , and NNLS estimation with the ridge penalty in Eq. 12. We label these estimation procedures as "LSE" and "NNLS," respectively. For both LSE and NNLS, we prepare a wide variety of statistical models by changing the tuning parameters. Here, we review the role of each tuning parameter and present their candidates as follows: • Q: the number of basis functions in the varying coefficient model. As Q increases, the electricity fluctuations caused by weather becomes large in time interval j (j 1, . . . , J). As we will show in Section 4.3, the value of Q may not highly affect the forecast accuracy. Therefore, it would be sufficient to set the candidates of Q as Q 5, 10.  • M: the number of basis functions for the impact of weather on demand. As M increases, the electricity fluctuations caused by weather becomes large in temperature s i . As shown in Figure 1, M 5 would be sufficient to express the relationship between weather and demand. Therefore, the candidates of M are M 5, 10. • T: the number of past demands for forecasting. In other words, we use demand in the past T days to forecast demand. In this numerical analysis, we construct the statistical models by day of the week separately. Therefore, the forecast is done by using the past T weeks on the same day of the week. Consequently, it would be sufficient to set the candidates of T as T 2, 4. • λ: regularization parameter for ridge regression. As λ increases, the regression coefficients become stable. Small λ prevents the overfitting, but too large λ leads to large bias.
The candidates of λ are 20 sequences from 10 -5 to 1.0 on a log-scale. We note that the log-scale sequence is often used to set the candidates of the regularization parameters (e.g., Friedman et al., 2010). We also set λ 0 to investigate the impact of the ridge parameter on the forecast accuracy. As we will see in Section 4.3, it would be sufficient to set the above candidates of λ to achieve good accuracy.
In addition to the above candidates of models, we consider two types of α jt : sample mean of the past demand (i.e., α jt 1/T) and AR(1) structure. Details of the AR(1) structure are presented at the end of Section 2.2. As a result, the total number of candidates of the models is 336 ( 2 × 2 × 2 × (20 + 1) × 2). These candidates are summarized in Table 1.

Impact of Tuning Parameters on Forecast Accuracy
The forecast accuracy of our proposed method depends on the tuning parameters presented above. We investigate the impact of tuning parameters on the mean absolute percentage error (MAPE) defined as Figure 3 shows the relationship between regularization parameter λ and MAPE. The relationships are investigated for all combinations of (Q, M, T).
The result shows that the AR(1) structure on α jt produces smaller MAPE than the mean structure (α jt 1/T) for all candidates of ridge parameter λ. For AR(1) model, the performance for T 4 is better than that for T 2. We observe that a large value of λ may result in poor forecast accuracy; meanwhile, a small amount of λ generally results in good accuracy. The result for λ 0 is not displayed here because the MAPE can be extremely large due to the non-convergence of parameters; the ridge penalty helps avoid such a nonconvergence. In summary, the AR(1) structure on α jt , T 4, and a small value of λ will result in a small MAPE for this dataset.

Forecast Accuracy
We compare the performance of our proposed method with the following popular machine learning techniques: random forest (RF), support vector machine (SVM), Least absolute shrinkage and selection operator (Lasso), LightGBM (LGBM; (Ke et al., 2017)), and Long Short Term Memory [LSTM; (Hochreiter and Schmidhuber, 1997;Van Houdt et al., 2020)]. The RF, SVM, LGBM, and LSTM involve nonlinear functions, allowing us to capture complex structures. We use R packages randomForest, ksvm, glmnet, and lgbm to implement RF, SVM, Lasso, and LGBM, respectively. The LSTM is implemented by tensorflow.keras library in python. The LGBM is based on the gradient boosting decision tree and achieves good forecast accuracy in various fields of research, including energy (e.g., (Ju et al., 2019)). The LSTM is the state-of-the-art technique for analyzing time series data using artificial neural networks (ANN). For these machine learning techniques, the forecast is made by the electricity demand of past T days and maximum temperature, that is,ŷ ij f(s i , y (i−1)j , . . . , y (i−T+1)j ).
The characteristics of our proposed method and the machine learning techniques are compared in Table 2.
A detailed description of Table 2 is as follows: • With small sample sizes, the machine learning methods that capture nonlinear structure provide good accuracy except for the LSTM; in general, the ANN can often yield poor accuracy with small sample sizes (e.g., Tange et al., 2017). The accuracy of the lasso is not sufficiently but relatively high because the lasso cannot capture the nonlinear structure. When the number of observations is large, the LGBM and LSTM will perform perfectly well because these  (Lawson and Hanson, 1995). The lasso algorithm (Friedman et al., 2010) is also extremely fast. The RF is a little bit slow with large sample sizes due to the resampling procedure. The LGBM is very fast regardless of sample sizes in our experience. The SVM becomes extremely slow with large n (e.g., n 100,000) because we must compute the Kernel function for each pair of observations, which requires O(n 2 ) operations.
In practice, we need to select a set of tuning parameters among candidates. The candidates of the tuning parameters for the proposed method are presented in Section 4.3. For machine learning methods, the candidates are detailed in Section S3 in the Supplementary Material. For all methods, we must select an appropriate set of tuning parameters. These tuning parameters are generally selected with cross-validation. In this study, we choose these tuning parameters to minimize the MAPE in the past year. The values of tuning parameters are changed every month. Table 3 shows the monthly MAPE for our proposed method and existing methods from April 2019 to March 2020. The result shows that the proposed method performs better than existing machine learning techniques in total. The SVM achieves the best performance in specific months but sometimes provides much larger MSE than other methods, resulting in worse performance than our proposed method in total. The RF and LGBM yield similar performance, and these methods produce larger MAPE than our proposed method. The lasso cannot capture the nonlinear relationship between temperature and demand as in Figure 1, resulting in poor performance in total. The LSTM provides the worst performance, probably due to the small number of observations: it is from n 132 to n 302. In general, the ANN does not perform well with such small sample sizes; (Tange et al., 2017) showed that the ANN performed worse than SVM for small sample sizes, and several similar numerical results are found in the literature (Bećirović and Ćosović, 2016;Dehalwar et al., 2016;Maldonado et al., 2019). We observe that the NNLS and LSE result in similar values of MAPE. These similar results are obtained probably due to the identifiability of the parameter. Our proposed procedure decomposes y t as y t ≈ μ t + b t , and this decomposition is not unique. Therefore, the NNLS and LSE produce different values of μ t andb t but similar values ofμ t +b t .
We also compute the MAPE for well-known Global Energy Forecasting Competition 2014 (GEFCom 2014) data (Hong and Fan, 2016), and obtain a similar result as that shown in Table 3. For detail, please refer to Section S4 in the Supplementary Material.

Interpretation
With our proposed method, the estimated model can be interpreted by decomposing the forecast value by the effects of temperature and past demand:ŷ ij b ij +μ ij . The values,b ij and μ ij , for NNLS and LSE from October 1st to 14th, 2019, are depicted in Figure 4.
Although the forecast accuracy of the LSE estimation is similar to that of the NNLS estimation, as shown in Table 3, the results of the decompositionŷ ij b ij +μ ij for these two methods are unalike in terms of values. The LSE often results in negative values ofb ij , and thenμ ij becomes substantially larger than the actual demand. In particular, the value ofb ij at working hours is negatively larger than that at midnight; on the other hand, the electricity is used primarily during working hours. As a result, the behavior ofb ij is counterintuitive;b ij becomes FIGURE 6 | Relationship between the number of daily infections and demand (upper panels), and between the average number of infections in the past 14 days and demand (lower panels). We depict these plots on working days. The curve fitting is done by polynomial regression with cubic function.
Frontiers in Energy Research | www.frontiersin.org December 2021 | Volume 9 | Article 724780 negatively larger asŷ ij increases. Thus, interpreting the effect of weather turns out to be difficult with LSE. This issue occurs because there are no restrictions on the sign ofb ij . We observe a similar behavior ofb ij on other datasets, including GEFCom 2014. In contrast, the effect of weather is appropriately captured by the NNLS estimation. Indeed, in many cases, the value ofb ij at working hours is positively larger than that at midnight. The constraint on nonnegativeness of γ significantly improves the interpretation of the weather effect while still maintaining excellent forecast accuracy.
To further investigate the effect of temperature, we depict b ij on Tuesday when maximum temperatures are 5°C (cold day), 20°C (cool day), and 35°C (hot day), which is shown in Figure 5. For existing methods, it is difficult to depictb ij because these methods do not assume the existence ofb ij . For existing methods, instead ofb ij , the weather effect is calculated depending on the input: we forecast the demand with specific temperature (5°C, 20°C, and 35°C) and average annual temperature (21.7°C in this case), and subtract the forecast value with average annual temperature from that with a specific temperature. We depict the weather effect on October 8th, 2019, and October 15th, 2019, for existing methods; both of these days are Tuesday.
The results of our proposed procedure show thatb ij is highly dependent on the temperature: the weather effect may be substantial for cold and hot days due to air conditioner use. For NNLS, the weather effect is always positive, which allows clear interpretation compared with LSE. Both NNLS and LSE result in smooth curves due to the smooth basis function of the regression coefficients. We observe that the weather effect is stable unless M is large and λ is excessively small.
The existing machine learning methods, RF, SVM, LGBM, and LSTM, are unstable and highly depend on the input. With the Lasso, the impact of temperature is almost zero, which implies that the weather effect cannot be captured. Our proposed method results in smoother curves than existing methods thanks to the B-spline basis expansion of regression coefficients in Eq. 4. As a result, our proposed method is more suitable for interpreting the weather effect than existing methods.

APPLICATION TO DATA FROM ONE SELECTED RESEARCH FACILITY IN JAPAN
In the second real data example, we apply our proposed method to the demand data from one selected research facility in Japan. The raw data cannot be published due to confidentiality. This  dataset consists of demand from January 1st, 2017 to May 30th, 2020. At certain times, demand is either not observed or include outliers due to electricity meter failures or blackouts. The daily data that contain such missing values and outliers are removed, resulting in 1,164 days of complete data. The demand is shown in kW at 1-h intervals (J 24). We observe that COVID-19 greatly influences the usage of this research facility's electricity pattern due to the recommendation of telework. In present times, it is essential to forecast the demand for this extraordinary situation. To this end, our proposed method is directly applicable to the input of daily data related to COVID-19, and we focus our attention on the forecast accuracy in March, April, and May 2020.

Input of COVID-19 Information
In Japan, the most frequently-used daily information about COVID-19 is the daily number of infections, available at https://www3.nhk.or.jp/news/special/coronavirus/data-all/. However, this variable turns out to be relatively unstable. In order to use more stable information about COVID-19, we may use the moving average of the number of infections; that is, the average number of infections in the past several days.
The research facility used in this study promotes the staff to work from home to prevent the spread of COVID-19. We expect that the electricity usage of the research facility decreases as the number of infected people of the COVID-19 increases; therefore, the COVID-19 information may affect the forecast results. Figure 6 shows the relationship between the number of daily infections and demand, and between the average number of infections in the past 14 days and demand. We depict these plots on working days. The curve fitting is done by polynomial regression with cubic function. The B-spline may not be suitable in this case due to the limited number of observations.
The number of daily infections includes outliers; thus resulting in unstable curve fitting. For example, when the number of infections is relatively large, the demand increases as the number of infections increases. This phenomenon is counterintuitive because this facility encourages working at home as much as possible to prevent infections. On the other hand, the average number of infections in the past 14 days is negatively correlated with the demand, and the curve fitting of the cubic function works well. For this reason, we use the average number of infections in the past 14 days as daily information about COVID-19. As a result, the daily variable s i becomes a twodimensional vector that consists of the maximum temperature and the average number of infections in the past 14 days. We use the cubic B-spline function as a basis function of maximum temperature, and the cubic function as a basis function of the average number of infections in the past 14 days.

Forecast Accuracy
In the previous real data example, we construct the statistical models by day of the week separately. However, in this case, the number of observations affected by COVID-19 is excessively small. Thus, only two statistical models are constructed based on working days (from Monday to Friday) and holidays (Saturday, Sunday, and national holidays). We consider the problem of the day-ahead forecast from March 1st, 2020 to May 30th, 2020 (data in 2017-2019 are only used for training). The training data consist of all demand data up to the previous day of the forecast day.
We investigate how the input of COVID-19 information improves the forecast accuracy. The NNLS does not produce negative regression coefficients but we observe that the electricity usage decreases as the average number of infections in the past 14 days increases, as shown in Figure 6. Thus, the electricity fluctuation caused by COVID-19, say b covid19 ij , should be negative. As the basis function, we use the cubic function of negative number of infections in the past 14 days; that is, b covid19 where s covid19 i is the number of infections in the past 14 days. Figure 7 shows the forecast values with our proposed method using NNLS and the actual demand. We depict two forecast values: one uses the information about COVID-19 and the other does not. The result shows that the usage of the COVID-19 data slightly improves the accuracy; for example, on the 10th and 13th. Figure 8 showsb ij of both temperature and COVID-19. The results show that after April 10th, the negative effect of COVID-19 is observed. The government declared a state of emergency on April 7th, and following this, the usage pattern of the electricity changed. The change in electricity pattern on the 8th and 9th may not be captured due to excessively small sample sizes related to the number of infections. However, after the 9th, the effect of COVID-19 is captured. (insert Figure 8 here). Table 4 shows MAPE for our proposed method and existing methods on March, April, and May 2020. We also compare the performance of two estimation procedures: one uses the information about COVID-19 and the other does not. In March, the result shows that the proposed method performs the best and Lasso yields the worst performance. This is because the effect of temperature can be appropriately captured as shown in the previous example. The effect of COVID-19 is not crucial in March because the performance becomes worse when the COVID-19 information is included. In April and May, the information about COVID-19 significantly improves the performance for almost all methods. In particular, our proposed method and Lasso both produce small MAPE values, and interestingly, Lasso performs slightly better than our proposed method. There are two reasons to explain the superiority of Lasso. First, the relationship between the number of infections and the demand is approximated by a linear function, as shown in Figure 6. Second, the temperature effect is not crucial because the temperature does not change much in April and May. The nonlinear machine learning techniques, that is, SVM, RF, LGBM, and LSTM yield large MAPE in April, probably due to the small number of observations related to COVID-19; these techniques result in overfitting. In particular, LSTM produces much worse MAPE than other methods. Such poor results may be due to the instability of the electricity demand; the neural networks can be highly sensitive to noise [e.g., (Goodfellow et al., 2014)]. Indeed, we observe that the electricity demand of this dataset turns out to be unstable compared with the datasets from Tokyo electric power company holdings and Global Energy Forecasting Competition 2014. To improve the accuracy of the LSTM, we need to change the input/output by incorporating much more information of the data (e.g., not daily but hourly weather and hourly COVID-19 data).

CONCLUDING REMARKS
We have constructed a statistical model for forecasting future electricity demand. To capture the nonlinear effect of weather information, we employed the varying coefficient model. With the ordinary least squares estimation (LSE), the estimate of b ij , sayb ij , became negative because some of the elements of regression coefficientsγ were negative. The negative weather effect led to the difficulty in the interpretation of the weather effect. To address this issue, we employed the NNLS estimation; this estimation is performed under the constraint that all of the elements ofγ are nonnegative. The practicality is illustrated through three real data examples. Two of these examples showed that our method performed better than the existing machine learning techniques. In the third example, our proposed method adequately captured the impact of COVID-19. Estimating the fluctuations of electric power caused by weather and COVID-19 would help make strategies for energy-saving interventions and demand response.
The proposed method is carried out under the assumption that the errors are uncorrelated. In practice, however, the errors among near time intervals may be correlated. As a future research topic, it would be interesting to assume a correlation among time intervals and estimate a regression model that includes the correlation parameter. Another important point is to incorporate the weather effects other than temperature and humidity; we may use cloud cover (Apadula et al., 2012), rainfall, and wind speed (Chapagain and Kittipiyakul, 2018). However, incorporating too many variables can often result in low forecast accuracy due to overfitting (Xie et al., 2016). The selection of an appropriate combination of variables is essential but beyond the scope of this study; we would like to take this as a future research topic.
Recently, demand response has been becoming one of the most important topics in the research field of energy. In particular, the machine learning techniques have been recently used for residential demand response (Zhou et al., 2016;Pallonetto et al., 2019;Afzalan and Jazizadeh, 2020). These techniques are based on the deterministic approaches. On the other hand, our proposed statistical model will lead to the demand response by the probabilistic approach (Schachter and Mancarella, 2015;Alipour et al., 2017). The probabilistic approach will allow the decision based on the probabilistic risk evaluation. Such demand response would be essential in practice, and we would like to work on this topic in the future.

AUTHOR CONTRIBUTIONS
The author confirms being the sole contributor of this work and has approved it for publication.

FUNDING
This work was partially supported by the Japan Society for the Promotion of Science KAKENHI Grant Number JP19K11862 and the Center of Innovation Program (COI) from JST Grant Number JPMJCE1318, Japan.

ACKNOWLEDGMENTS
The author would like to thank reviewers for the constructive and helpful comments that improved the quality of the paper considerably. The author also thanks Professor Hiroki Masuda and Maiya Hori for helpful comments and discussions.