LASSO Principal Component Averaging – a fully automated approach for point forecast pooling

This paper develops a novel, fully automated forecast averaging scheme, which combines LASSO estimation method with Principal Component Averaging (PCA). LASSO-PCA (LPCA) explores a pool of predictions based on a single model but calibrated to windows of di ﬀ erent sizes. It uses information criteria to select tuning parameters and hence reduces the impact of researchers’ at hock decisions. The method is applied to average predictions of hourly day-ahead electricity prices over 650 point forecasts obtained with various lengths of calibration windows. It is evaluated on four European and American markets with almost two and a half year of out-of-sample period and compared to other semi-and fully automated methods, such as simple mean, AW / WAW, LASSO and PCA. The results indicate that the LASSO averaging is very e ﬃ cient in terms of forecast error reduction, whereas PCA method is robust to the selection of the speciﬁcation parameter. LPCA inherits the advantages of both methods and outperforms other approaches in terms of MAE, remaining insensitive the the choice of a tuning parameter.


Introduction
Electricity price forecasting (EPF) is nowadays perceived as fundamental for decision making in energy markets. As short-term transactions provide a tool for adjusting long-term positions and a benchmark in over-the-counter (OTC) trading, the day-ahead, intraday and balancing prices play a key role in day-to-day operations (Kath and Ziel, 2018;Maciejowska et al., 2019;Mayer and Trück, 2018;Weron, 2014). In the last decades, the market share of renewable energy sources has rapidly increased. As a result, intermittent changes in generation level and structure have become more likely to occur. This leads to an increase of market imbalances and a rise of electricity price volatility (Gianfreda et al., 2016;Kowalska-Pyzalska, 2018;Maciejowska, 2020). Hence, reliable methods dedicated to EPF are more than essential in rational managing of energy companies.
One of the methods to increase the prediction accuracy is to combine forecasts obtained with different models. The idea of forecast averaging has started about half a century ago. Pioneering papers of Bates and Granger (1969) and Crane and Crotty (1967) inspired many authors to develop new methods and contribute to the area. Since the late 60s, hundreds of papers have suggested the superiority of forecast combinations over individual models (Timmermann, 2006;Wallis, 2011;Nowotarski and Weron, 2016). Hibon and Evgeniou (2005) states that the main advantage of combining forecasts is the fact that, in practice, it is less risky to combine forecasts than to select an individual forecasting method.
Recently, more and more experts have put an attention to a selection of calibration window, which is used for model estimation (see Pesaran and Timmermann, 2007). Marcos et al. (2020) claim that in rapidly developing markets, such as an energy market, researchers should take into account structural breaks and adjust model parameters to market changes. The simplest solution of the issue is to work with short data, which describes only the most recent events. This approach has some severe drawbacks as it decreases the estimation accuracy and limits the complexity of applied models. On the other hand, one may try to estimate a time of a structural break and include it directly in a forecasting model. An assumption of a discrete shift of model parameters is however not suitable for more complex evolution patterns (Marcos et al., 2020). In the literature, there is no agreement, which solution is the best and therefore the majority of research in EPF applies an arbitrary chosen calibration window length. In recent articles , Marcjasz et al. (2018); Hubicka et al. (2019); Serafin et al. (2019) suggest to use a pool of different in-sample data sizes and to average the resulting forecasts. The outcomes presented in these papers suggest that a choice of three 'short' and three 'long' calibration windows provide robust results, which outperform all individual predictions. This conclusion is questioned by Maciejowska et al. (2020), who show that the suggested solution is not valid for all the electricity markets and has to be adjusted to a market specification.
The estimation of a single model with various calibration windows enables to obtain a large number of predictions. For example, in Maciejowska et al. (2020) a panel of 673 forecasts is built. Moreover, it could be observed that predictions in such a pool are very similar to each other, because a slight change of an estimation window does not alter much the model parameters. Thus, it is natural to search for methods that would help to reduce the dimension of the problem, without losing useful information. In this context, two approaches are natural candidates: Principal Component (PC) method, which summarizes the panel with a small number of components (see Stock and Watson, 2002;Bai and Ng, 2002) and Least Absolute Shrinkage and Selection Operator (LASSO, Tibshirani, 1996), which reduces the dimension of a model by assigning a penalty to non-zero parameters. Here, we propose a novel approach, which combines these two methods and apply them to forecast averaging.
PC is a well-known tool, which has been successfully applied for analyzing big panels of data. It has been used to predict directly the variables of interest (Boivin and Ng, 2005;Stock and Watson, 2012) or to augment a small-scale econometric model (Banerjee et al., 2014). The factor models has been extended to account for dynamic relationships (see Forni et al., 2000;Forni and Lippi, 2001) and used to create economic indicators (Stock and Watson, 1998). Although the potential of PCA in forecast averaging area was recognized by Chan et al. (1999) and Huang and Lee (2010), there are only few papers, which illustrate its performance. Stock and Watson (2004) and Poncela et al. (2011) used PCA to predict macroeconomic variables. They estimated components from a panel of forecasts coming from either different models or different experts. In both cases, the panels were relatively small and diversified. Maciejowska et al. (2020) proposed an algorithm, which extracts PCs from a standardized, large panel of predictions coming from a single model (as in Marcjasz et al. (2018), Hubicka et al. (2019) and Serafin et al. (2019)) and use them to calculate the final forecasts via linear regression. In this article, 1-4 components were used. The results indicate that PCA is a robust method for forecast pooling. The major issue of Maciejowska et al. (2020) is the fact that the number of PCs is either chosen a prior or selected from a small number of alternatives. Moreover, it is not clear how the approach will perform, if a larger number of components is considered.
The literature proposes many methods of dealing with a large set of potential explanatory variables. Two major approaches could be distinguished: selecting an optimal model (Ludwig et al., 2015;Ziel et al., 2015;Gaillard et al., 2016;Uniejewski et al., 2016) or average across models ( (Yang, 2001;Hansen and Racine, 2012;Wand et al., 2014)). Here, we adopt the first approach and apply LASSO, which was introduced by Tibshirani (1996) and is one of the most popular and important regularization methods. Because of its linear penalty function, the LASSO estimator shrinks the coefficients of the less important explanatory variables to zero. It becomes a tool for automated variable selection, as it identifies the significant variables and excludes the redundant ones (Uniejewski et al., 2016;. In the context of prediction pooling, the LASSO technique has been successfully used in both point Diebold and Shin (2019) and probabilistic Bayer (2018); Bracale et al. (2019); Uniejewski and Weron (2021) forecasting. It is worth noticing that, to our best knowledge, LASSO averaging has not been applied to point forecasting of electricity prices and therefore there is a need to evaluate its performance in this field.
The main novelty of this paper is a fully automated forecast averaging scheme, which utilizes both PCA and LASSO regularization techniques. We present an algorithm, which extends the approach described in Maciejowska et al. (2020) and allows to use an arbitrary large number of components. Thanks to LASSO estimation method, the irrelevant PCs are excluded and hence the corresponding noise is reduced. Since LASSO depends on a tuning parameter, Information Criteria (IC) are applied to select its optimal value. Unlike in a typical LASSO averaging, the inputs in LPCA are orthogonal to each other. Moreover, although one could use all PCs, a smaller number of components than individual forecasts should be sufficient. Hence, LPCA should be much easier and faster to compute than the full panel LASSO. As a result, the proposed methodology does not require any expert knowledge nor intuition to obtain the prediction of future prices but also should be less computationally burdensome than existing methods.
The paper is structured as follows. First, we present the datasets that consist of day-ahead price series as well as exogenous variables. At the end of section 2 we describe a data transformation. Next, in Section 3 we present the methodology, first, to obtain point forecasts and afterwards to average them. In the same section, we introduce a new algorithm for a fully automated approach designed to combine forecasts. Finally, in Section 4 we present the results of our study and in Section 5 we conclude the research.

Datasets
The datasets used in this study cover five years and describe four different markets: German (EPEX), Scandinavian (Nord Pool, NP), Spanish (OMIE) and American (PJM). All time series have an hourly resolution and span 1826 days from 1.01.2015 to 31.12.2019 (the data is not extended to 2020, as the COVID-19 pandemic has changed the market dynamics). The missing or 'doubled' values (corresponding to the time change) are replaced by the average of the closest observations, for the missing hours, and the arithmetic mean of the two values, for 'doubled' hours. Note that the data is double indexed, with d denoting the day and h the hour of an observation.

Day-ahead electricity prices
This research focuses on electricity prices from day-ahead markets, which are established simultaneously around noon on the day preceding the delivery. A more detailed description of the day-ahead market design can be found in Weron (2014). As a result, market participants can utilize only the information available at the time of bidding. This impacts also the forecasters, who should include in their models only the data published before the noon (see Huisman et al., 2007).
In this article, the following day-ahead prices, DA d,h , are considered: • the German market EPEX spot (top panel in Figure 1a); the data taken from transparency platform (https://transparency.entsoe.eu) • the Scandinavian market Nord Pool (top panel in Figure 1b); the data taken from Nord Pool website (https://www.nordpoolgroup.com)) • the Spanish market OMIE (top panel in Figure 2a); the data taken from OMIE website (https://www.omie.es) • the American market PJM COMED (top panel in Figure 2b); the data taken from PJM data miner (https://dataminer2.pjm.com)

Exogenous variables
The literature indicates that various exogenous factors, such as generation structure or fuel prices, have an important impact on the electricity prices and can be used for their forecasting (Gianfreda et al., 2020;Billé et al., 2022). Following Maciejowska et al. (2020), in this study, we consider day-ahead predictions of fundamental variables describing the demand and supply of electricity, which are provided by transmission system operators (TSO). The description of the data can be found in Table 1. Notice that the set of exogenous variables changes between markets and depends on the data availability. The day-ahead forecasts for all exogenous variables are plotted in Figures 1 and 2. The variables, in particular load and solar generation, exhibit strong yearly seasonality, with the load following also a weekly pattern.

Variance Stabilizing Transformation
As it can be easily seen in Figures 1 and 2, electricity prices exhibit spiky behavior.  argue that it is possible to reduce the influence of such extreme values on forecasts by using a Variance Stabilizing Transformation (VST). These findings are confirmed by the literature Marcjasz et al., 2018). Here, we follow the recommendation of  and apply the N-PIT transformation (to all variables in the dataset). Let us recall that the N-PIT transformation is based on the so-called probability integral transform. Lets consider a time series Y d,h . Its transformation,Ỹ d,h , is given by: whereF Y (·) is the empirical cumulative distribution function of in-sample Y, and N −1 is the quantile function of normal distribution. After the models are estimated on the transformed time series, we apply the inverse transformation to obtain the final forecast of electricity price: where the time series Y corresponds to price series DA.

Experiment design
The majority of research in EPF literature chooses arbitrary the length of a calibration window. In last years, various research (see Marcjasz et al., 2018;Hubicka et al., 2019;Serafin et al., 2019;Maciejowska et al., 2020) has shown that averaging predictions based on different in-sample data leads to an improvement of the forecast accuracy. Here, we follow this idea and use a pool of 673 calibration window lengths -ranging from 56 (ca. two months) to 728 days (ca. two years). Unlike in previous papers, this research focuses on the automatisation of the averaging process in order to make it independent of at hock decisions of forecasters.
The pool of forecasts is obtained with a rolling window procedure, a standard procedure in EPF literature (Weron, 2014). To be more specific, the first 728 days are used for model estimation (for shorter windows, the calibration sample is left truncated, so it ends on the same day). Next, 24 point forecasts are computed, one for each hour of the day, and finally the window is moved one day forward. The procedure is repeated until the last out-of-sample day is reached. Once the pool of predictions is created, a rolling window of 182 days (ca. half of a year) is used to calibrate the averaging methods (see Section 3.3). The final predictions are evaluated using the last 916 days of the sample. The division into the point forecast, averaging and the out-of-sample periods are marked by dashed lines in Figures 1 and 2. The first line marks the end of the initial 728-day calibration window for point forecasts (i.e., 1 January 2015 to 28 December 2016). The second indicates the end of the initial 182-day calibration window for averaging forecasts (i.e., 28 June 2017), which is also the beginning of the evaluation period.   (b) PJM system prices (top), day-ahead system load prognosis (middle), day-ahead zonal (COMED) load prognosis (bottom) Figure 2: Day-ahead prices and exogenous time series from 1 January 2015 to 31 December 2019. The vertical dashed lines mark respectively the beginning of the out-of-sample test period for point forecasts (29 December 2016; also the beginning of the initial 182-day calibration window for averaging forecasts) and the beginning of the out-of-sample test period for averaging forecasts (29 June 2017). The first 728 days constitute the initial calibration window for point forecasts.

Forecasting models
In this research, forecasts for all 24 hours of the next day are computed simultaneously a day in advance. Similarly to Maciejowska et al. (2020), we consider a parsimonious autoregressive structure used in a number of EPF studies (Uniejewski et al., 2016;Ziel and Weron, 2018;. The originally proposed setup is expanded to include the exogenous variables presented in Section 2.2. The final model is denoted by DA. The price DA d,h for day d and hour h is described by the following formula: where  (3) only for hours 9-17 because during the night and early morning hours the solar generation is too weak to impact the electricity price.

Averaging methods
According to recent literature, the forecasting performance of statistical models is sensitive to the choice of the calibration window (Hubicka et al., 2019). Hence, it may be beneficial to average forecasts based on windows of different lengths (Pesaran and Timmermann, 2007;Hubicka et al., 2019) as it allows to explore both a local and a long-run behavior. Although estimation of the same model with different data sets seems straightforward, the forecast averaging remains a demanding task. First, it could be noticed that a large number of predictions, which are based on long windows, are almost identical. Extending the sample by one observation from, for example, 727 to 728 days, does not alter much the parameter estimates. This feature impedes the usage of typical regressions for choosing averaging weights, as a large number of forecasts are almost co-linear. On the other hand, there is a relatively small number of predictions based on short windows, which are distinct. Unfortunately, these forecasts are also more variable and typically burdened with a 8 larger forecast error. Finally, it is not clear how to balance the impact of the short-and longwindows on the final prediction.
In this paper, we consider three types of forecast combining methods. First, predictions are computed either as a simple or as a weighted mean of individual forecasts. Next, the weights are selected with LASSO method, which is a regression-based approach. LASSO allows to include a large number of input variables and shrinks the parameters toward zero. Hence, it can help to select the optimal window lengths. Finally, the information included in the panel of forecasts is summarized by a set of common factors (computed as principal components, PCs), which are next used to compute the predictions of interest.

Linear average (simple average, AW, WAW)
In this research, we consider three methods based on a linear average. The literature indicates that the arithmetic mean is a simple but very efficient approach (Genre et al., 2004). Here, we compute the mean of all considered windows' sizes ranging from 56 days to 728 days and denote it by a simple average. Second, following Hubicka et al. (2019), a subset of six calibration window lengths is selected, which consists of three short (56-, 84-, 112-days) and three long (714-, 721, 728-days) in-sample sizes. Forecasts based on these chosen window sizes are next averaged. This approach is denoted by AW(56,84,112,714,721,728)) or simply AW. Unfortunately, both: simple average and AW, assume that the weights are equal and constant over time. Therefore, they cannot adopt to changing market conditions, for example, a rising share of renewable energy sources (RES) in the generation mix.
In order to overcome this problem, Marcjasz et al. (2018) proposed to extend AW to allow for data-driven weights. Similar to Hubicka et al. (2019), a small subset of available forecasts is first selected. Then, instead of taking a simple average, Marcjasz et al. (2018) use the forecast errors from the previous day to assign weights to each individual prediction. The forecasts are evaluated with Mean Absolute Error (MAE) and those, which are more accurate, get higher weights (for more details see equation (5) in Marcjasz et al. (2018)). Here, following Maciejowska et al. (2020), we use the whole averaging window (182 days) to compute the weights. Similar to AW, the Weighted AW is denoted as WAW (56,84,112,714,721,728) or simply WAW.
An application of linear averages is associated with some issues. First, when computing the simple average, the majority of inputs come from long calibration windows, which provide very similar forecasts. Hence, the long windows dominate and reduce the impact of local behavior. This drawback is reduced in AW and WAW approaches, as they include the same number of short and long windows and balance the impact of different window sizes. Unfortunately, AW/WAW, unlike the simple average, requires pre-selection of the number and lengths of calibration windows used for averaging. Hence, it could not be considered as a robust approach because a subset, which works well for one market, may not be plausible for the other.

LASSO averaging
The idea of regularization of an estimation process can be viewed as an optimization problem: where β is a parameter vector and X is a data set. In equation (4), f (X; β) denotes a loss function, e.g. the Residual Sum of Squares (RSS) as in Least Squares estimation method, while g(β) is the penalty function (Tikhonov, 1963).
In the literature, it is common to use a scaled ℓ q norm as g(β). The most popular variant of the regularization, called LASSO, was introduced by Tibshirani (1996). It sets q = 1 and f (X; β) = RSS (see (5)). Due to its properties, it becomes a tool for automated variable selection and can successfully identify the most important variables (Uniejewski et al., 2016;. LASSO is also one of the most popular solutions to combine point forecasts. It becomes a gold standard in the literature, especially for high-dimensional problems (it is when the number of individual predictions exceeds the number of in-sample observations). It has the property of selecting only a few individual point forecasts even in the case of rich pools, which benefits in accuracy improvement. In a recent paper, Uniejewski and Weron (2021) showed that the linear penalty regularization works also in probabilistic forecasting.
In this article, LASSO regression is used to average all (673) point forecasts from the pool (see Section 3.1). We consider a log-scaled grid of 20 λ parameters (LASSO(λ)) and choose its optimal value via Information Criteria: AIC, BIC and HQC. The procedure to select the tuning parameter is taken from Ziel and Weron (2018) and its results are denoted by LASSO(BIC), LASSO(AIC), LASSO(HQC).

Principal Component Averaging (PCA)
Many forecast averaging methods strongly depend on expert knowledge. For example, AW and WAW require pre-selection of window lengths used in the forecast pooling. In order to overcome this issue, Maciejowska et al. (2020) proposed to use Principal Component Averaging (PCA) to automate the procedure of averaging over a rich pool of predictions. Authors applied the principal component method to a panel of over 650 point forecasts obtained with models calibrated with different in-sample sizes. Next, they used the estimated components in a linear regression to form the final predictions. In such a way, they overcome the problem of co-linearity of forecasts stemming from the same model calibrated on similar windows. Their results indicated that the PCA forecast averaging leads to more accurate predictions of electricity prices in terms of MAE than the simple average, AW or WAW.
The step-by-step algorithm of PCA is described below. In the algorithm, d f denotes the forecasted day and τ = 56, 57, . . . , 728 stands for the length (in days) of a calibration window used to calculate the predictions. Moreover, during the averaging, all the hourly predictions are treated as time series and indexed with t. The averaging window includes the predicted day d f and 182 proceeding days: Finally, in the following parts of the paper,P t,τ denotes the predicted electricity prices for period t obtained with a τ-day calibration window, whereas P t stands for their actual level.
2. Standardize forecasts and the real price with previously estimatedμ t andσ t : Notice that at the time of forecasting, the last 24 elements of Z t , corresponding to the predicted day d f , are not known.
3. Estimate the first K principal components, (PC t,1 , PC t,2 , . . . .PC t,K ), of a panel {Ẑ t,τ }, using the method described by Bai and Ng (2002); Stock and Watson (2004). Notice that PCs include the information of the price forecasts for all hours in 182-days long averaging calibration window as well as the forecasted day.
4. Estimate linear regression parameters with Least Squares (LS) using observations from the averaging window (without day d f ) 5. Using estimated parameters compute the prediction of the normalized price Z t for t ∈ 24d f + 1, 24d f + 24 corresponding to all hours in forecasted day d f : and transform it back into its original level Although PCA allows to explore the information included in the whole panel of forecasts, it still requires selection of the number of components used in a regression, K. Therefore, similar to Maciejowska et al. (2020), we consider the method based on k-first PCs and denote them by PCA(k). For illustrative purposes, we also choose ex-post optimal (fixed) number of PCs taken for averaging and denote it by PCA(best).
Next, three variants of PCA are applied, which are based on Information Criteria (IC). This allows a data-driven adjustment of the number of PCs used in the regression (7). We consider the same ICs, which are used to select λ in LASSO procedure. The results are denoted consecutively by PCA(BIC), PCA(AIC), PCA(HQC).

LASSO Principal Component Averaging (LPCA)
In this paper, we propose a novel approach, which combines PCA-based procedure with LASSO estimation method. First, similar to Maciejowska et al. (2020), K components are extracted from the standardized panel of point predictions (see Section 3.3.3 for a detailed description of the algorithm). Unlike in previous work, the number of PCs is substantial (here, 20 components) and can be arbitrary big. Next, the PCs are used as input variables in the regression (7). In order to estimate the model's parameters, LASSO method is applied. This approach enables calibration of the model even when the number of PCs is larger than the size of the averaging calibration window. Moreover, it shrinks the parameters toward zero and hence reduces the noise induced by redundant components. Finally, the predictions of all hours of day d f are calculated (8) and transformed back into the original units (9).
The LASSO optimization algorithm depends on a parameter λ, which specifies the impact of the penalty function. Similar to LASSO averaging, we consider a log-scaled grid of 20 λ and select the optimal value via IC. The outcomes are denoted either by LPCA(λ) or by LPCA(BIC), LPCA(AIC), LPCA(HQC), respectively.
Since the LPCA does not require any prior decision neither on the size of the calibration windows used for averaging (as in AW/WAW) nor is restrictive in terms of the number of PC components (as in PCA), it can be perceived as a fully automated method. Moreover, thanks to the orthogonality of the PCs, the estimation algorithm is faster than LASSO averaging.

Result
We use the Mean Absolute Error (MAE) for the full out-of-sample test period of D = 916 days (i.e., 29.06.2017 to 31.12.2019, see Figure 1 or 2) as the main evaluation criterion. It is one of the most commonly used measures for evaluation of forecast accuracy. In the case of electricity markets, it reflects the average deviation of the revenue from selling 1 MWh from its expected level. In this paper, we consider two MAE-based measures: where ε (i) d,h = P d,h −P (i) d,h is the forecast error for hour h in day d, obtained either with different lengths calibration window, τ, or averaging methods. The first measure, MAE (i) d describes the forecast accuracy for a single day d and is used for statistical comparison between individual approaches. Finally MAE (i) describes the overall performance in the whole out-of-sample period.
As an auxiliary measure, we define a percentage change of forecast accuracy relative to the results of a model with the longest considered calibration window, it is 728-day (MAE (728) ).  The relative change in the accuracy of a given model shows how different the model is from the usual approach of taking as long calibration windows as possible. Note that the positive sign of the measure indicates that a given model is worse than the benchmark, while the negative value appears when a given model outperforms the longest window approach. Given a number of datasets, it is hard to rank the models' accuracy. To solve this issue, we use a mean of the % chng i over four datasets to obtain the final ranking.
where m indicates one of four datasets (EPEX, NP, OMIE, PJM). The obtained MAE values can be used to provide a ranking of forecasts. Unfortunately, they do not allow to draw statistically significant conclusions on the outperformance of one prediction over the another. Therefore, the conditional predictive ability (CPA) test of Giacomini and White (2006) is used to compare competitive outcomes. The test statistic is computed using the vector of average daily MAE d : For each pair (i, j), the p-value of the CPA test is computed.

Individual forecasts
The performance of individual forecasts are presented in Figure 3, which shows the values of MAE for different calibration window lengths in the four analyzed markets. It could be ob-13  (2019) and Marcjasz et al. (2018) and prove that it is impossible to ex-ante choose optimally the length of the calibration window size. Table 2 presents the detailed results for three selected window sizes: 56 days (8 weeks), 364 days (a year) and 728 days (2 years). They are next compared with the benchmark, which is the longest available calibration window. The outcomes are augmented with the results for the optimal window size, which is selected ex post and hence is not available for real-time usage. The results indicate that the selection of the calibration window length may have a great impact on the forecast accuracy. The gains from its proper choice reach up to 12.527% (EPEX market).

Averaging results
Tables 3 and 4 present MAE and %chng results for the forecasts obtained with different averaging techniques. Here, two approaches are evaluated separately: semi-automated and fullyautomated. In the first group of methods, arbitrary decisions of researchers about the number of components to be averaged are allowed. Moreover, the penalty parameter λ in LASSO method is pre-defined for the whole sample. In the second group, the methods are fully automated, which means that the forecaster is not involved in the averaging process.

Semi-automated averaging methods
Let us first analyze the outcomes of semi-automated approaches, in which the researcher decides a prior on the selection of forecasts used for averaging. In all considered methods, the inputs are chosen once for the whole evaluation period and do not adjust as the calibration and averaging windows move. The results are reported in Table 3. First, the outcomes of AW and WAW methods are presented that are based only on a small subset of individual point forecasts (three short and three long windows). It can be observed that both approaches yield results, which are far better than the benchmark. By averaging forecasts stemming from just six different calibration windows, the MAE is reduced by more than 10% for EPEX, NP and OMIE and at least 3% for PJM. When both methods are compared, it can be observed that the weighted approach is better than AW, which assigns equal weights for all predictions. 14 Next, the error measures for LASSO, PCA, and LPCA with parameters selected ad hoc, based on existing literature and experience, are presented. For each method, first three rows show outcomes for exemplary specifications described either by the number of components, k, in PCA(k) or λ in LASSO(λ) and LPCA(λ). The forth row reports results for the best ex-post value of these parameters. The outcomes confirm that using forecast averaging techniques is beneficial. Similar to AW/WAW, all three methods enables substantial reduction of MAE, with the following specifications being the best: LASSO(10 0 ), PCA(5) and LPCA(10 −2 ).
When LASSO averaging scheme is considered, it can be observed that the results depend strongly on the parameter λ. There are substantial differences between LASSO(10 −2 ) and LASSO(10 0 ), which reach 18.922% of the benchmark MAE for NP and 12.906% for PJM. Moreover, LASSO(10 −2 ) is the worst of the averaging schemes and provides predictions less accurate than the 2-year calibration window for NP and PJM markets.
Performance of PCA is more robust to selection of the specification parameter, k. The relation between MAE of PCA and the number of components, k, is non-monotonic. First, as the number of PCs increases, the forecasts become more accurate. As it reaches the optimal level of k, additional components introduce noise and lead to a rise of MAE. Hence, increasing the number of components does not improve the overall performance of the method.
When the results of LPCA are analyzed, it could be observed that LPCA inherits the positive features of both PCA and LASSO and reduces their weaknesses. Similar to PCA, LPCA it is robust to the choice of the tuning parameter, λ. On the other hand, it allows to use a large number of components without a loss of efficiency because LASSO allows to reduce the parameter space.
Finally, when the LASSO(best), PCA(best) and LPCA(best) are compared, LPCA and LASSO are both the best in two out of four markets with the PCA scheme never reaching the top of the podium. The aggregated results, summarized by m.p.d.b., confirm that LPCA yields the most accurate predictions among any alternatives.
The results for non-automated averaging approaches can be summarised by following conclusions: • Almost all averaging approaches (except LASSO(10 −2 ) outperform the 'longest window' model by a large margin, often even higher than 10% • The most accurate forecast can be obtained with LASSO and LPCA, both are the best for two out of four datasets.
• The performance of LASSO depends strongly on λ, wheres PCA and LPCA are more robust to the choice of the specification parameters.
• The idea of AW and WAW, introduced by Hubicka et al. (2019) and Marcjasz et al. (2018), performs very well, however it can be outperformed by more sophisticated approaches

Fully automated averaging methods
In this article, four fully automated forecast averaging methods are considered. These are approaches, which do not require any expert knowledge to select the inputs used for forecast averaging or to specify parameters such as the number of components, k, in PCA and a value of LASSO tuning parameter, λ. The results are presented in Table 4, which similar to Table 3 shows MAE forecast accuracy measure and %chng.
The first method is a simple average. It is an automated approach because it does not require any pre-selection of predictions used for pooling. This method provides forecasts, which are far better than the benchmark. It reduces MAE by 1.149%-10.221%, which is slightly less than in AW/WAW case.
Next, three methods: LASSO, PCA and LPCA are analyzed. Unlike the previous section, here the tuning parameters: k and λ, are selected with Information Criteria (AIC, BIC and HQ). This modification has two major advantages. First, it does not require a prior knowledge on the specification of these methods in a particular application. Hence, it can be easily used for predicting prices of other commodities or for any other forecasting exercises. Second, the parameters can evolve as new data arrives and adjust to the market situation.
First, it could be noticed that LASSO method is sensitive to the choice of IC. For AIC, it provides forecasts, which are less accurate than a benchmark for three out of four analyzed markets. For PJM, the loss of accuracy exceeds 20%. Even for EPEX market, for which the gains are the highest, LASSO(AIC) is only slightly better than the predictions obtained with the longest calibration window. Moreover, LASSO(HQC), although better than LASSO(AIC), does not provide satisfactory results. It improves the predictions for EPEX, NP and OMIE but worsens them for PJM by more than 7%. Only LASSO(BIC) gives results, which are consistently better than the benchmark.
Similar to LASSO, the performance of LPCA approach depends on the choice of IC. In this case, the differences between ICs are less pronounced, with LPCA(BIC) providing the most accurate predictions. Hence, as well as for the standard LASSO, also for LPCA it is BIC that should be used for selecting the parameter λ. It is worth noting that all three LPCA methods produce the best forecasts in terms of MAE for EPEX, NP and PJM markets. They are outperformed only by LASSO(BIC) in OMIE case.
In the case of the PCA method, it is hard to choose the clear winner between different ICs. For each dataset, a different approach provides the most accurate results. The differences, however, are not substantial, so the optimal number of PCs can be successfully selected via any of the considered ICs. Although the most robust, the approach is never the best choice in terms of MAE accuracy, as it is outperformed by either LPCA or LASSO.
The last column of Table 4 presents m.p.d.b, the aggregated measure of forecast accuracy. The outcomes show that well-designed averaging models can outperform the most popular approach of an arithmetic mean. Moreover, they confirm previous findings obtained using semi-automated methods and indicate that LPCA reduces MAE more than other averaging approaches.
To formally investigate the advantages of using our newly proposed averaging method, we apply the Conditional Predictive Ability (CPA; see Giacomini and White (2006)) test for significant differences in the forecasting performance. The outcomes are presented in Figure 4, on which a non-black square indicates that the forecasts of the model on the X-axis are statistically more accurate than the forecasts of a model on the Y-axis. The results confirm the previous findings and show the LPCA extension of the standard PCA approach significantly outperforms other methods,  Figure 4: Results of the conditional predictive ability (CPA) test of Giacomini and White (2006) for forecasts of selected models for the EPEX (left), Nord Pool (left center), OMIE (right center) and PJM (right) datasets. We use a heat map to indicate the range of the p-values -the closer they are to zero (→ dark green) the more significant is the difference between the forecasts of a model on the X-axis (better) and the forecasts of a model on the Y-axis (worse).
in particular simple mean and PCA, for each considered dataset. What is more, it is two out of four times significantly better compared to the LASSO and never worse. Finally, it could be noticed that the simple average is almost every time outperformed by other averaging approaches. This result shows that the arithmetic mean is useful as a benchmark for the newly introduced methodology, however, it should not be treated as a golden standard.
To sum up: • Almost all averaging approaches (except LASSO(AIC) and LASSO(HQC)) can easily beat the 'longest window' model by a large margin • Among the forecasts based on LASSO or LPCA methods, the most accurate results are obtained with BIC.
• The PCA method is the most robust to the choice of IC. None of the ICs dominates and all of them provide similar results.
• Overall, the best result can be obtained with LPCA(BIC).

Discussion
In this research, a performance of different averaging schemes based on forecasts obtained with different calibration windows is analyzed. It is shown that it is beneficial to pool predictions even when they come from a single model. A large number of individual forecasts available for averaging becomes both the advantage and the main issue of this idea, which makes it difficult to fully automate the computations. Here, two approaches are explored that are based on information and parameter space reduction. PCA method allows to summarize the data described by a panel of forecasts with a relatively small set of orthogonal components, whereas LASSO shrinks the model's parameters toward zero and hence increase the estimation efficiency. This research demonstrates that the application of both approaches can result in a substantial increase of forecast accuracy. Unfortunately, the methods are burdened with the uncertainty associated with the choice of tuning parameters. The dependence of the results on this selection is illustrated in Table  5, which shows the best outcomes in terms of m.p.d.b together with the mean and the Mean Absolute Deviation (MAD) of m.p.d.b across different specifications. The results indicate that although LASSO(10 0 ) and LASSO(BIC) are among the best forecast averaging approaches, the LASSO method is sensitive to the selection of the tuning parameter and the IC. Its average m.p.d.b. is slightly less than 6% and 3% for semi-and fully automated approaches, respectively. At the same time, LPCA improves forecasts by 9.582% and 10.1%, respectively. Moreover, PCA and LPCA methods are characterized by low values of MAD, which are far smaller than in LASSO case. The difference in performance of LASSO, PCA, and LPAC forecast averaging methods results from their construction. When PCA approach is considered, it should be underlined that components used for averaging are orthogonal to each other and hence enables efficient estimation of (8) parameters. However, unlike LPCA, this approach includes all PCs from 1 to k in the regression. Application of LASSO to (8) allows to reduce the parameter space. The method does not only eliminate the insignificant components, but also shrinks the weights corresponding to less important variables.  compared LASSO with two-step procedure including variable selection via LASSO and estimating weights (of selected variables) via OLS. It turned out that LASSO significantly outperforms two-step procedure. Similar situation can be observed also in our research. The limited study presented in Table 6 shows that applying the twostep procedure does not improve (on average) the forecast accuracy compared to PCA(BIC). This indicates that the shrinkage is even more important than selection in our task. The regularization improves the averaging accuracy not because it allows to better select the number of PCs, but the LASSO shrunken weights are better to use in this setup.
Finally, when LASSO and LPCA methods are compared, it could be noticed that LASSO has many more inputs than LPCA. Extracting information from the panel of forecasts via the PC reduces the dimension of the regression. Moreover, unlike the PCs, the individual forecasts are highly correlated and almost co-linear. Due to these features, LASSO is more sensitive to specification of the tuning parameter. Moreover, the CPU needed to compute the forecast with LASSO is 900-times higher compared to the time needed to perform LPCA.

Conclusions
In this paper, a novel approach for point forecast pooling is presented, which combines both LASSO estimation method and PCA averaging scheme introduced by Maciejowska et al. (2020). PCA allows to summarize the information included in a panel of forecasts with a relatively small set of orthogonal components, whereas LASSO shrinks the model's parameters toward zero and hence increase the estimation efficiency. The performance of the approach is evaluated on datasets from four major energy markets. Following Marcjasz et al. (2018) and Hubicka et al. (2019), the point predictions used for pooling stem from a single ARX-type model calibrated to windows of different sizes. The forecasts are evaluated with MAE and the results are presented relative to the outcomes obtained with the longest available calibration window, which includes two years of observations.
The results confirm previous findings of Marcjasz et al. (2018) and Maciejowska et al. (2020) that the longest estimation window does not necessarily lead to the most accurate predictions. Hence, it is not possible to select a prior optimal length of the sample used for calibration. At the same time, averaging algorithms can substantially reduce MAE and improve the forecast accuracy relative to the benchmark, by -6.631% for a simple average and -10.271% for LPCA(BIC) approach.
When the forecast averaging methods are considered, the outcomes indicate that fully automated approaches, which use Information Criteria to select an optimal specification, yield results which are significantly better than the benchmark or the simple average. The performance of the presented pooling methods depends, however, on applied IC. The outcomes show that BIC is the most robust choice, which leads to the lowest relative MAE for all approaches. The comparison of LASSO, PCA and LPCA allows to draw the following conclusions: • The PCA method is the most robust to the choice of IC, however, it reduces MAE less than the methods using LASSO • LASSO is extremely sensitive to the choice of the tuning parameter and IC • Overall LPCA outperforms other approaches: it improves the forecast accuracy the most and is relatively robust to the selection of the tuning parameter The LPCA approach, which combines LASSO with PCA, is proved to be successfully in forecasting day-ahead electricity prices. This research could be viewed as a first step in mixing PCA 20 with automated variable selection methods. Future analysis may include more complex models such as elastic net, adaptive lasso, or neural network-based. Moreover, the research may be extended to interval and probabilistic forecasting and be applied to other commodity markets.