Construction of an informative hierarchical prior for a small sample with the help of historical data and application to electricity load forecasting

Launay, Tristan; Philippe, Anne; Lamarche, Sophie

doi:10.1007/s11749-014-0416-0

Construction of an informative hierarchical prior for a small sample with the help of historical data and application to electricity load forecasting

Original Paper
Published: 11 November 2014

Volume 24, pages 361–385, (2015)
Cite this article

TEST Aims and scope Submit manuscript

Tristan Launay^1,2,
Anne Philippe¹ &
Sophie Lamarche²

204 Accesses
6 Citations
Explore all metrics

Abstract

We are interested in the estimation and prediction of a parametric model on a short dataset upon which it is expected to overfit and perform badly. To overcome the lack of data (relatively to the dimension of the model), we propose the construction of an informative hierarchical Bayesian prior based on another longer dataset which is assumed to share some similarities with the original, short dataset. We illustrate the performance of our prior on simulated datasets from two standard models. We then apply the methodology to a working model for the electricity load forecasting on real datasets, where it leads to a substantial improvement of the quality of the predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structured priors for sparse probability vectors with application to model selection in Markov chains

Article 12 February 2019

Matthew Heiner, Athanasios Kottas & Stephan Munch

Likelihood-free approximate Gibbs sampling

Article 11 March 2020

G. S. Rodrigues, David J. Nott & S. A. Sisson

Approximating Predictive Probabilities of Gibbs-Type Priors

Article 19 April 2020

Julyan Arbel & Stefano Favaro

References

Abramovitz M, Stegun I (1965) Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover Publications, New York
Google Scholar
Al-Zayer J, Al-Ibrahim A (1996) Modelling the impact of temperature on electricity consumption in the eastern province of Saudi Arabia. J Forecast 15:97–106
Article Google Scholar
Albert J (2009) Bayesian computation with R. Springer, Dordrecht
Book MATH Google Scholar
Berger J (1985) Statistical decision theory and Bayesian analysis. Springer series in statistics. Springer, New York
Berger J, Bernardo J (1992) On the development of the reference prior method. Bayesian Stat 4:35–60
Google Scholar
Bernardo JM (1979) Reference posterior distributions for Bayesian inference. J R Stat Soc Ser B 41(2):113–147
Bouveyron C, Jacques J (2013) Adaptive mixtures of regressions: improving predictive inference when population has changed. Commun Stat: Simul Comput (to appear)
Bruhns A, Deurveilher G, Roy J (2005) A non-linear regression model for mid-term load forecasting and improvements in seasonnality. In: Proceedings of the 15th power systems computation conference, Liege, Belgium
Bunn D, Farmer E (1985) Comparative models for electrical load forecasting. Wiley, New York
Google Scholar
Cam L, Yang G (2000) Asymptotics in statistics: some basic concepts. Springer series in statistics. Springer, New York
Congdon P (2010) Applied Bayesian hierarchical methods. Chapman & Hall, CRC, New York
Book MATH Google Scholar
Cottet R, Smith M (2003) Bayesian modeling and forecasting of intraday electricity load. J Am Stat Assoc 98(464):839–849
Article MathSciNet Google Scholar
Cugliari J (2011) Prévision non paramétrique de processus à valeurs fonctionnelles, application à la consommation d’électricité. PhD thesis, Univérsité Paris Sud XI
Dordonnat V, Koopman S, Ooms M, Dessertaine A, Collet J (2008) An hourly periodic state space model for modelling French national electricity load. Int J Forecast 24(4):566–587
Article Google Scholar
Efron B (2010) Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Institute of Mathematical Statistics Monographs, Cambridge University Press, Cambridge
Engle R, Granger C, Rice J, Weiss A (1986) Semiparametric estimates of the relation between weather and electricity. J Am Stat Assoc 81:310–320
Article Google Scholar
Fan J, Yao Q (2005) Non linear time series: nonparametric and parametric methods. Springer, New York
Google Scholar
Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge
Gelman A, Carlin J, Stern H, Dunson D, Vehtari A, Rubin D (2013) Bayesian data analysis. Chapman & Hall CRC Texts in Statistical Science, New York
Ghosh JK, Delampady M, Samanta T (2006) An introduction to Bayesian analysis: theory and methods. Springer, New York
Google Scholar
Harrison P, Stevens C (1976) Bayesian forecasting. J R Stat Soc 38(3):205–247
MATH MathSciNet Google Scholar
Hyndman RJ, Khandakar Y (2008) Automatic time series forecasting: the forecast package for R. J Stat Softw 88(3)
Launay T (2012) Bayesian methods for electricity load forecasting. PhD thesis, Université de Nantes
Launay T, Philippe A, Lamarche S (2012) Consistency of the posterior distribution and MLE for piecewise linear regression. Electron J Stat 6:1307–1357
Article MATH MathSciNet Google Scholar
Marin JM, Robert C (2014) Bayesian essentials with R, 2nd edn. Springer texts in statistics, Springer, New York
Menage JP, Panciatici P, Boury F (1988) Nouvelle modelisation de l’influence des conditions climatiques sur la consommation d’energie electrique. Tech. rep., EDF R&D
Minka TP (1999) Bayesian linear regression. Tech. rep., 3594 Security Ticket Control
Ramanathan R, Engle R, Granger C, Vahid-Araghi F, Brace C (1997) Short-run forecasts of electricity loads and peaks. Int J Forecast 13:161–174
Article Google Scholar
Robert CP (2007) The Bayesian choice: from decision-theoretic foundations to computational implementation, 2nd edn. Springer, New York
Google Scholar
Robert CP, Casella G (2009) Introducing Monte Carlo methods with R, 1st edn. Springer, New York
Seber GAF, Wild CJ (2003) Nonlinear regression. Wiley series in probability and statistics. Wiley-Interscience, New York
Smith M (2000) Modeling and short-term forecasting of new south wales electricity system load. J Bus Econ Stat 18:465–478
Google Scholar
Soares L, Medeiros M (2008) Modeling and forecasting short-term electricity load: a comparison o methods with an application to Brazilian data. Int J Forecast 24:630–644
Article Google Scholar
Taylor JW (2003) Short-term electricity demand forecasting using double seasonal exponential smoothing. J Oper Res Soc 54(8):799–805
Article MATH Google Scholar
Taylor JW, Buizza R (2003) Using weather ensemble predictions in electricity demand forecasting. Int J Forecast 19(1):57–70
Article Google Scholar
Yang R, Berger J (1998) A catalog of noninformative priors. Institute of Statistics and Decision Sciences, Duke University, Tech. rep.

Download references

Acknowledgments

The authors would like to thank Adélaïde Priou for collecting a part of the data as well as the corresponding results, and Virginie Dordonnat for the insightful discussions.

Author information

Authors and Affiliations

Laboratoire de Mathématiques Jean Leray, Université de Nantes, 2 Rue de la Houssinière, BP 92208, 44322, Nantes Cedex 3, France
Tristan Launay & Anne Philippe
Electricité de France R&D, 1 Avenue du Général de Gaulle, 92141, Clamart Cedex, France
Tristan Launay & Sophie Lamarche

Authors

Tristan Launay
View author publications
You can also search for this author in PubMed Google Scholar
Anne Philippe
View author publications
You can also search for this author in PubMed Google Scholar
Sophie Lamarche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anne Philippe.

Appendix

Using the notation $M_{i\bullet }$ for the $i$th row of a matrix $M$, the non-linear model described in (6) can be re-written in the following condensed way: for $t = 1,\ldots ,N,$

$$\begin{aligned} y_t&= (A_{t\bullet } \alpha ) (B_{t\bullet } \beta + C_{t}) + \gamma (T_t - u)1\!\!1_{[T_t,\,+\infty [}(u) + \epsilon _t. \nonumber \\&= f_t(\theta ) + \epsilon _t \end{aligned}$$

(7)

The matrices $A$ of size $N\times d_A$, $B$ of size $N\times d_{\beta }$, $C$ of size $N\times 1$, and $T$ of size $N\times 1$ are known exogenous variables while the parameters of the model to be estimated are

$$\begin{aligned} \eta = (\theta ,\sigma ^2) = (\alpha , \beta , \gamma , u, \sigma ^2) \in \mathbb {R}^{d_\alpha } \times B_+^{d_\beta }(0, 1) \times \mathbb {R}^* \times [\underline{u},\,\overline{u}] \times \mathbb {R}_+^*, \end{aligned}$$

where $B_+^{d_{\beta }}(0, 1) = \{ \beta \in (\mathbb {R}_+)^{d_{\beta }} ;\; \Vert \beta \Vert _1 \le 1 \}$ is the positive quadrant of the $\Vert \cdot \Vert _1$-unit ball of dimension $d_{\beta }$.

Proposition 1

For $(\beta , u) \in B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]$ denote $A_*(\beta , u)$ the matrix whose rows are

$$\begin{aligned}(A_*)_{t\bullet }(\beta , u) = \left[ ( B_{t\bullet }\beta + C_{t})A_{t\bullet }, (T_{t}-u)1\!\!1_{[T_{t},\,+\infty [}(u)\right] , \quad t=1,\ldots ,N,\end{aligned}$$

and suppose $A_*^\prime (b, u) A_*(b, u)$ has full rank for every $(\beta , u) \in B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]$. Assume furthermore that $N>d_\alpha +1$ and that $(y_1,\ldots ,y_N)$ are observations coming from the model (7). The posterior measure corresponding to the informative prior designed in (4) is then a well-defined (proper) probability distribution.

Proof

First notice that $ \int {\pi (\theta , k, l, q, r | y)} \,\mathrm {d}\sigma ^{2} $ is proportional to

$$\begin{aligned} \Vert y - f(\theta )\Vert _2^{-N} 1\!\!1_{[0,\,1]}(\Vert \beta \Vert _1)1\!\!1_{[\underline{u},\,\overline{u}]}(u) \pi (\theta |k, l)\pi (k | q, r)\pi (l)\pi (q)\pi (r), \end{aligned}$$

for almost every $y$ and that the function $\theta \mapsto \Vert y - f(\theta )\Vert _2^{-N}$ is bounded, for almost every $y$. The posterior integrability is hence trivial as long as $\pi (\theta |k, l)\pi (k | q, r)\pi (l)\pi (q)\pi (r)$ itself is a proper distribution which is the case here. $\square $

Proposition 2

Under the same assumptions as in Proposition 1, the posterior measure corresponding to the non-informative prior $\pi (\theta , \sigma ^2) \propto \sigma ^{-2}$ is also a well-defined (proper) probability distribution.

Proof

Notice first that

$$\begin{aligned} \int {\pi (\theta , \sigma ^2 | y)} \,\mathrm {d}\sigma ^2&\propto \Vert y - f(\theta )\Vert _2^{-N} 1\!\!1_{[0,\,1]}(\Vert b\Vert _1)1\!\!1_{[\underline{u},\,\overline{u}]}(u) \quad \text {for almost every } y, \end{aligned}$$

and observe then that

$$\begin{aligned} \Vert y - f(\theta )\Vert _2^2&= \sum _{t=1}^N \left[ y_{t} - (B_{t\bullet }\beta +C_{t}) A_{t\bullet }\alpha - (T_{t}-u)1\!\!1_{[T_{t},\,+\infty [}(u)\gamma \right] ^2. \end{aligned}$$

Let $(\beta _0, u_0) \in B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]$ and denote $\alpha _* = (\alpha , \gamma )$. We write

$$\begin{aligned} \Vert y\! -\! f((\alpha ,\beta _0,\gamma ,u_0))\Vert _2^2&\!=\! \sum _{t=1}^N \left[ y_{t} \!-\! (B_{t\bullet }\beta _0+C_{t}) A_{t\bullet }\alpha - (T_{t}-u_0)1\!\!1_{[T_{t},\,+\infty [}(u_0)\gamma \right] ^2 \\&= \Vert y - A_*(\beta _0, u_0) \alpha _*\Vert _2^2, \end{aligned}$$

and thus obtain the following equivalence, as $(\beta , u)\rightarrow (\beta _0,u_0)$ and $\Vert \alpha _*\Vert _2\rightarrow +\infty $

$$\begin{aligned} \Vert y - f(\theta )\Vert _2^{-N} \sim \Vert y - A_*(\beta _0,u_0) \alpha _*\Vert _2^{-N}. \end{aligned}$$

(8)

The triangular inequality applied to the right-hand side of (8) gives

$$\begin{aligned} \Vert y - A_*(\beta _0,u_0) \alpha _*\Vert _2^{-N}&\le \big | \Vert y\Vert _2 - \Vert A_*(\beta _0,u_0) \alpha _*\Vert _2 \big |^{-N}. \end{aligned}$$

(9)

Since $A_*^\prime (\beta _0,u_0) A_*(\beta _0,u_0)$ has full rank, by straightforward algebra we get

$$\begin{aligned} \lambda \Vert \alpha _*\Vert _2^2&\le \Vert A_* (\beta _0,u_0)\alpha _*\Vert _2^2, \end{aligned}$$

where $\lambda $ is the smallest eigenvalue $(A_*(\beta _0,u_0))^\prime A_*(\beta _0,u_0)$ and is strictly positive. We can hence find an equivalent of the right-hand side of (9) as $\Vert \alpha _*\Vert _2\rightarrow +\infty $, which is

$$\begin{aligned} \big | \Vert y\Vert _2 - \Vert A_*(\beta _0,u_0) \alpha _*\Vert _2 \big |^{-N} \sim \lambda ^{-N/2} \Vert \alpha _*\Vert _2^{-N}. \end{aligned}$$

(10)

Combining (8), (9) and (10) together, we see that the integrability of the left-hand side of (8) as $(\beta , u)\rightarrow (\beta _0,u_0)$ and $\Vert \alpha _*\Vert _2\rightarrow +\infty $ is directly implied by that of $\Vert \alpha _*\Vert _2^{-N}$. The latter is immediate for $N > d_\alpha +1$, as can be seen via a Cartesian to hyperspherical re-parametrisation.

The previous paragraph thus ensures the integrability of $\Vert y - f(\theta )\Vert _2^{-N}$ over sets of the form

$$\begin{aligned} \{(\beta ,u)\in V(\beta _0, u_0), \Vert \alpha _*\Vert _2 \in ]M(\beta _0,u_0),\,+\infty [\}, \quad \forall (\beta _0, u_0) \in B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}] \end{aligned}$$

where the subset $V(b_0, u_0)$ is an open neighbourhood of $(\beta _0, u_0)$ and $M(\beta _0, u_0)$ is a real number depending on $(\beta _0, u_0)$. By compactness of $B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]$ there exists a finite union of such $V(\beta _i, u_i)$ that covers $B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]$. Denoting $M$ the maximum of $M(\beta _i, u_i)$ over the corresponding finite subset of $(\beta _i, u_i)$, we finally obtain the integrability of $\Vert y - f(\theta )\Vert _2^{-N}$ over $\{(\beta ,u)\in B_+^{d_{\beta }}(0, 1), \Vert \alpha _*\Vert \in ]M,\,+\infty [\}$.

The integrability of $\Vert y - f(\theta )\Vert _2^{-N}$ over $\{(\beta ,u)\in B_+^{d_{\beta }}(0, 1), \Vert \alpha _*\Vert \in [0,\,M]\}$ is trivial, recalling that $\theta \mapsto \Vert y - f(\theta )\Vert _2$ is continuous and does not vanish over this compact for almost every $y$, meaning that its inverse shares these same properties. $\square $

Remark 1

The condition “$A_*^\prime A_*$ has full rank” mentioned above is typically verified in our applications for the regressors used in our model. To see this, call “vector of heating degrees” the vector whose coordinates are $(T_{t}-u)1\!\!1_{[T_{t},\,+\infty [}(u)$. Then not satisfying the aforementioned condition is equivalent to saying that there exists an index $i$ and a threshold $u$ such that the family of vectors formed by the regressors $A$ and the vector of heating degrees is linearly dependent over the subset $\Psi _i$ of the calendar”.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Launay, T., Philippe, A. & Lamarche, S. Construction of an informative hierarchical prior for a small sample with the help of historical data and application to electricity load forecasting. TEST 24, 361–385 (2015). https://doi.org/10.1007/s11749-014-0416-0

Download citation

Received: 17 February 2014
Accepted: 27 October 2014
Published: 11 November 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s11749-014-0416-0

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Construction of an informative hierarchical prior for a small sample with the help of historical data and application to electricity load forecasting

Abstract

Access this article

Similar content being viewed by others

Structured priors for sparse probability vectors with application to model selection in Markov chains

Likelihood-free approximate Gibbs sampling

Approximating Predictive Probabilities of Gibbs-Type Priors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proposition 1

Proof

Proposition 2

Proof

Remark 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Construction of an informative hierarchical prior for a small sample with the help of historical data and application to electricity load forecasting

Abstract

Access this article

Similar content being viewed by others

Structured priors for sparse probability vectors with application to model selection in Markov chains

Likelihood-free approximate Gibbs sampling

Approximating Predictive Probabilities of Gibbs-Type Priors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proposition 1

Proof

Proposition 2

Proof

Remark 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation