Skip to main content
Log in

Construction of an informative hierarchical prior for a small sample with the help of historical data and application to electricity load forecasting

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

We are interested in the estimation and prediction of a parametric model on a short dataset upon which it is expected to overfit and perform badly. To overcome the lack of data (relatively to the dimension of the model), we propose the construction of an informative hierarchical Bayesian prior based on another longer dataset which is assumed to share some similarities with the original, short dataset. We illustrate the performance of our prior on simulated datasets from two standard models. We then apply the methodology to a working model for the electricity load forecasting on real datasets, where it leads to a substantial improvement of the quality of the predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Abramovitz M, Stegun I (1965) Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover Publications, New York

    Google Scholar 

  • Al-Zayer J, Al-Ibrahim A (1996) Modelling the impact of temperature on electricity consumption in the eastern province of Saudi Arabia. J Forecast 15:97–106

    Article  Google Scholar 

  • Albert J (2009) Bayesian computation with R. Springer, Dordrecht

    Book  MATH  Google Scholar 

  • Berger J (1985) Statistical decision theory and Bayesian analysis. Springer series in statistics. Springer, New York

  • Berger J, Bernardo J (1992) On the development of the reference prior method. Bayesian Stat 4:35–60

    Google Scholar 

  • Bernardo JM (1979) Reference posterior distributions for Bayesian inference. J R Stat Soc Ser B 41(2):113–147

  • Bouveyron C, Jacques J (2013) Adaptive mixtures of regressions: improving predictive inference when population has changed. Commun Stat: Simul Comput (to appear)

  • Bruhns A, Deurveilher G, Roy J (2005) A non-linear regression model for mid-term load forecasting and improvements in seasonnality. In: Proceedings of the 15th power systems computation conference, Liege, Belgium

  • Bunn D, Farmer E (1985) Comparative models for electrical load forecasting. Wiley, New York

    Google Scholar 

  • Cam L, Yang G (2000) Asymptotics in statistics: some basic concepts. Springer series in statistics. Springer, New York

  • Congdon P (2010) Applied Bayesian hierarchical methods. Chapman & Hall, CRC, New York

    Book  MATH  Google Scholar 

  • Cottet R, Smith M (2003) Bayesian modeling and forecasting of intraday electricity load. J Am Stat Assoc 98(464):839–849

    Article  MathSciNet  Google Scholar 

  • Cugliari J (2011) Prévision non paramétrique de processus à valeurs fonctionnelles, application à la consommation d’électricité. PhD thesis, Univérsité Paris Sud XI

  • Dordonnat V, Koopman S, Ooms M, Dessertaine A, Collet J (2008) An hourly periodic state space model for modelling French national electricity load. Int J Forecast 24(4):566–587

    Article  Google Scholar 

  • Efron B (2010) Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Institute of Mathematical Statistics Monographs, Cambridge University Press, Cambridge

  • Engle R, Granger C, Rice J, Weiss A (1986) Semiparametric estimates of the relation between weather and electricity. J Am Stat Assoc 81:310–320

    Article  Google Scholar 

  • Fan J, Yao Q (2005) Non linear time series: nonparametric and parametric methods. Springer, New York

    Google Scholar 

  • Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge

  • Gelman A, Carlin J, Stern H, Dunson D, Vehtari A, Rubin D (2013) Bayesian data analysis. Chapman & Hall CRC Texts in Statistical Science, New York

  • Ghosh JK, Delampady M, Samanta T (2006) An introduction to Bayesian analysis: theory and methods. Springer, New York

    Google Scholar 

  • Harrison P, Stevens C (1976) Bayesian forecasting. J R Stat Soc 38(3):205–247

    MATH  MathSciNet  Google Scholar 

  • Hyndman RJ, Khandakar Y (2008) Automatic time series forecasting: the forecast package for R. J Stat Softw 88(3)

  • Launay T (2012) Bayesian methods for electricity load forecasting. PhD thesis, Université de Nantes

  • Launay T, Philippe A, Lamarche S (2012) Consistency of the posterior distribution and MLE for piecewise linear regression. Electron J Stat 6:1307–1357

    Article  MATH  MathSciNet  Google Scholar 

  • Marin JM, Robert C (2014) Bayesian essentials with R, 2nd edn. Springer texts in statistics, Springer, New York

  • Menage JP, Panciatici P, Boury F (1988) Nouvelle modelisation de l’influence des conditions climatiques sur la consommation d’energie electrique. Tech. rep., EDF R&D

  • Minka TP (1999) Bayesian linear regression. Tech. rep., 3594 Security Ticket Control

  • Ramanathan R, Engle R, Granger C, Vahid-Araghi F, Brace C (1997) Short-run forecasts of electricity loads and peaks. Int J Forecast 13:161–174

    Article  Google Scholar 

  • Robert CP (2007) The Bayesian choice: from decision-theoretic foundations to computational implementation, 2nd edn. Springer, New York

    Google Scholar 

  • Robert CP, Casella G (2009) Introducing Monte Carlo methods with R, 1st edn. Springer, New York

  • Seber GAF, Wild CJ (2003) Nonlinear regression. Wiley series in probability and statistics. Wiley-Interscience, New York

  • Smith M (2000) Modeling and short-term forecasting of new south wales electricity system load. J Bus Econ Stat 18:465–478

    Google Scholar 

  • Soares L, Medeiros M (2008) Modeling and forecasting short-term electricity load: a comparison o methods with an application to Brazilian data. Int J Forecast 24:630–644

    Article  Google Scholar 

  • Taylor JW (2003) Short-term electricity demand forecasting using double seasonal exponential smoothing. J Oper Res Soc 54(8):799–805

    Article  MATH  Google Scholar 

  • Taylor JW, Buizza R (2003) Using weather ensemble predictions in electricity demand forecasting. Int J Forecast 19(1):57–70

    Article  Google Scholar 

  • Yang R, Berger J (1998) A catalog of noninformative priors. Institute of Statistics and Decision Sciences, Duke University, Tech. rep.

Download references

Acknowledgments

The authors would like to thank Adélaïde Priou for collecting a part of the data as well as the corresponding results, and Virginie Dordonnat for the insightful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne Philippe.

Appendix

Appendix

Using the notation \(M_{i\bullet }\) for the \(i\)th row of a matrix \(M\), the non-linear model described in (6) can be re-written in the following condensed way: for \(t = 1,\ldots ,N,\)

$$\begin{aligned} y_t&= (A_{t\bullet } \alpha ) (B_{t\bullet } \beta + C_{t}) + \gamma (T_t - u)1\!\!1_{[T_t,\,+\infty [}(u) + \epsilon _t. \nonumber \\&= f_t(\theta ) + \epsilon _t \end{aligned}$$
(7)

The matrices \(A\) of size \(N\times d_A\), \(B\) of size \(N\times d_{\beta }\), \(C\) of size \(N\times 1\), and \(T\) of size \(N\times 1\) are known exogenous variables while the parameters of the model to be estimated are

$$\begin{aligned} \eta = (\theta ,\sigma ^2) = (\alpha , \beta , \gamma , u, \sigma ^2) \in \mathbb {R}^{d_\alpha } \times B_+^{d_\beta }(0, 1) \times \mathbb {R}^* \times [\underline{u},\,\overline{u}] \times \mathbb {R}_+^*, \end{aligned}$$

where \(B_+^{d_{\beta }}(0, 1) = \{ \beta \in (\mathbb {R}_+)^{d_{\beta }} ;\; \Vert \beta \Vert _1 \le 1 \}\) is the positive quadrant of the \(\Vert \cdot \Vert _1\)-unit ball of dimension \(d_{\beta }\).

Proposition 1

For \((\beta , u) \in B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]\) denote \(A_*(\beta , u)\) the matrix whose rows are

$$\begin{aligned}(A_*)_{t\bullet }(\beta , u) = \left[ ( B_{t\bullet }\beta + C_{t})A_{t\bullet }, (T_{t}-u)1\!\!1_{[T_{t},\,+\infty [}(u)\right] , \quad t=1,\ldots ,N,\end{aligned}$$

and suppose \(A_*^\prime (b, u) A_*(b, u)\) has full rank for every \((\beta , u) \in B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]\). Assume furthermore that \(N>d_\alpha +1\) and that \((y_1,\ldots ,y_N)\) are observations coming from the model (7). The posterior measure corresponding to the informative prior designed in (4) is then a well-defined (proper) probability distribution.

Proof

First notice that \( \int {\pi (\theta , k, l, q, r | y)} \,\mathrm {d}\sigma ^{2} \) is proportional to

$$\begin{aligned} \Vert y - f(\theta )\Vert _2^{-N} 1\!\!1_{[0,\,1]}(\Vert \beta \Vert _1)1\!\!1_{[\underline{u},\,\overline{u}]}(u) \pi (\theta |k, l)\pi (k | q, r)\pi (l)\pi (q)\pi (r), \end{aligned}$$

for almost every \(y\) and that the function \(\theta \mapsto \Vert y - f(\theta )\Vert _2^{-N}\) is bounded, for almost every \(y\). The posterior integrability is hence trivial as long as \(\pi (\theta |k, l)\pi (k | q, r)\pi (l)\pi (q)\pi (r)\) itself is a proper distribution which is the case here. \(\square \)

Proposition 2

Under the same assumptions as in Proposition 1, the posterior measure corresponding to the non-informative prior \(\pi (\theta , \sigma ^2) \propto \sigma ^{-2}\) is also a well-defined (proper) probability distribution.

Proof

Notice first that

$$\begin{aligned} \int {\pi (\theta , \sigma ^2 | y)} \,\mathrm {d}\sigma ^2&\propto \Vert y - f(\theta )\Vert _2^{-N} 1\!\!1_{[0,\,1]}(\Vert b\Vert _1)1\!\!1_{[\underline{u},\,\overline{u}]}(u) \quad \text {for almost every } y, \end{aligned}$$

and observe then that

$$\begin{aligned} \Vert y - f(\theta )\Vert _2^2&= \sum _{t=1}^N \left[ y_{t} - (B_{t\bullet }\beta +C_{t}) A_{t\bullet }\alpha - (T_{t}-u)1\!\!1_{[T_{t},\,+\infty [}(u)\gamma \right] ^2. \end{aligned}$$

Let \((\beta _0, u_0) \in B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]\) and denote \(\alpha _* = (\alpha , \gamma )\). We write

$$\begin{aligned} \Vert y\! -\! f((\alpha ,\beta _0,\gamma ,u_0))\Vert _2^2&\!=\! \sum _{t=1}^N \left[ y_{t} \!-\! (B_{t\bullet }\beta _0+C_{t}) A_{t\bullet }\alpha - (T_{t}-u_0)1\!\!1_{[T_{t},\,+\infty [}(u_0)\gamma \right] ^2 \\&= \Vert y - A_*(\beta _0, u_0) \alpha _*\Vert _2^2, \end{aligned}$$

and thus obtain the following equivalence, as \((\beta , u)\rightarrow (\beta _0,u_0)\) and \(\Vert \alpha _*\Vert _2\rightarrow +\infty \)

$$\begin{aligned} \Vert y - f(\theta )\Vert _2^{-N} \sim \Vert y - A_*(\beta _0,u_0) \alpha _*\Vert _2^{-N}. \end{aligned}$$
(8)

The triangular inequality applied to the right-hand side of (8) gives

$$\begin{aligned} \Vert y - A_*(\beta _0,u_0) \alpha _*\Vert _2^{-N}&\le \big | \Vert y\Vert _2 - \Vert A_*(\beta _0,u_0) \alpha _*\Vert _2 \big |^{-N}. \end{aligned}$$
(9)

Since \(A_*^\prime (\beta _0,u_0) A_*(\beta _0,u_0)\) has full rank, by straightforward algebra we get

$$\begin{aligned} \lambda \Vert \alpha _*\Vert _2^2&\le \Vert A_* (\beta _0,u_0)\alpha _*\Vert _2^2, \end{aligned}$$

where \(\lambda \) is the smallest eigenvalue \((A_*(\beta _0,u_0))^\prime A_*(\beta _0,u_0)\) and is strictly positive. We can hence find an equivalent of the right-hand side of (9) as \(\Vert \alpha _*\Vert _2\rightarrow +\infty \), which is

$$\begin{aligned} \big | \Vert y\Vert _2 - \Vert A_*(\beta _0,u_0) \alpha _*\Vert _2 \big |^{-N} \sim \lambda ^{-N/2} \Vert \alpha _*\Vert _2^{-N}. \end{aligned}$$
(10)

Combining (8), (9) and (10) together, we see that the integrability of the left-hand side of (8) as \((\beta , u)\rightarrow (\beta _0,u_0)\) and \(\Vert \alpha _*\Vert _2\rightarrow +\infty \) is directly implied by that of \(\Vert \alpha _*\Vert _2^{-N}\). The latter is immediate for \(N > d_\alpha +1\), as can be seen via a Cartesian to hyperspherical re-parametrisation.

The previous paragraph thus ensures the integrability of \(\Vert y - f(\theta )\Vert _2^{-N}\) over sets of the form

$$\begin{aligned} \{(\beta ,u)\in V(\beta _0, u_0), \Vert \alpha _*\Vert _2 \in ]M(\beta _0,u_0),\,+\infty [\}, \quad \forall (\beta _0, u_0) \in B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}] \end{aligned}$$

where the subset \(V(b_0, u_0)\) is an open neighbourhood of \((\beta _0, u_0)\) and \(M(\beta _0, u_0)\) is a real number depending on \((\beta _0, u_0)\). By compactness of \(B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]\) there exists a finite union of such \(V(\beta _i, u_i)\) that covers \(B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]\). Denoting \(M\) the maximum of \(M(\beta _i, u_i)\) over the corresponding finite subset of \((\beta _i, u_i)\), we finally obtain the integrability of \(\Vert y - f(\theta )\Vert _2^{-N}\) over \(\{(\beta ,u)\in B_+^{d_{\beta }}(0, 1), \Vert \alpha _*\Vert \in ]M,\,+\infty [\}\).

The integrability of \(\Vert y - f(\theta )\Vert _2^{-N}\) over \(\{(\beta ,u)\in B_+^{d_{\beta }}(0, 1), \Vert \alpha _*\Vert \in [0,\,M]\}\) is trivial, recalling that \(\theta \mapsto \Vert y - f(\theta )\Vert _2\) is continuous and does not vanish over this compact for almost every \(y\), meaning that its inverse shares these same properties.   \(\square \)

Remark 1

The condition “\(A_*^\prime A_*\) has full rank” mentioned above is typically verified in our applications for the regressors used in our model. To see this, call “vector of heating degrees” the vector whose coordinates are \((T_{t}-u)1\!\!1_{[T_{t},\,+\infty [}(u)\). Then not satisfying the aforementioned condition is equivalent to saying that there exists an index \(i\) and a threshold \(u\) such that the family of vectors formed by the regressors \(A\) and the vector of heating degrees is linearly dependent over the subset \(\Psi _i\) of the calendar”.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Launay, T., Philippe, A. & Lamarche, S. Construction of an informative hierarchical prior for a small sample with the help of historical data and application to electricity load forecasting. TEST 24, 361–385 (2015). https://doi.org/10.1007/s11749-014-0416-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-014-0416-0

Keywords

Mathematics Subject Classification

Navigation