Abstract
We are interested in the estimation and prediction of a parametric model on a short dataset upon which it is expected to overfit and perform badly. To overcome the lack of data (relatively to the dimension of the model), we propose the construction of an informative hierarchical Bayesian prior based on another longer dataset which is assumed to share some similarities with the original, short dataset. We illustrate the performance of our prior on simulated datasets from two standard models. We then apply the methodology to a working model for the electricity load forecasting on real datasets, where it leads to a substantial improvement of the quality of the predictions.
Similar content being viewed by others
References
Abramovitz M, Stegun I (1965) Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover Publications, New York
Al-Zayer J, Al-Ibrahim A (1996) Modelling the impact of temperature on electricity consumption in the eastern province of Saudi Arabia. J Forecast 15:97–106
Albert J (2009) Bayesian computation with R. Springer, Dordrecht
Berger J (1985) Statistical decision theory and Bayesian analysis. Springer series in statistics. Springer, New York
Berger J, Bernardo J (1992) On the development of the reference prior method. Bayesian Stat 4:35–60
Bernardo JM (1979) Reference posterior distributions for Bayesian inference. J R Stat Soc Ser B 41(2):113–147
Bouveyron C, Jacques J (2013) Adaptive mixtures of regressions: improving predictive inference when population has changed. Commun Stat: Simul Comput (to appear)
Bruhns A, Deurveilher G, Roy J (2005) A non-linear regression model for mid-term load forecasting and improvements in seasonnality. In: Proceedings of the 15th power systems computation conference, Liege, Belgium
Bunn D, Farmer E (1985) Comparative models for electrical load forecasting. Wiley, New York
Cam L, Yang G (2000) Asymptotics in statistics: some basic concepts. Springer series in statistics. Springer, New York
Congdon P (2010) Applied Bayesian hierarchical methods. Chapman & Hall, CRC, New York
Cottet R, Smith M (2003) Bayesian modeling and forecasting of intraday electricity load. J Am Stat Assoc 98(464):839–849
Cugliari J (2011) Prévision non paramétrique de processus à valeurs fonctionnelles, application à la consommation d’électricité. PhD thesis, Univérsité Paris Sud XI
Dordonnat V, Koopman S, Ooms M, Dessertaine A, Collet J (2008) An hourly periodic state space model for modelling French national electricity load. Int J Forecast 24(4):566–587
Efron B (2010) Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Institute of Mathematical Statistics Monographs, Cambridge University Press, Cambridge
Engle R, Granger C, Rice J, Weiss A (1986) Semiparametric estimates of the relation between weather and electricity. J Am Stat Assoc 81:310–320
Fan J, Yao Q (2005) Non linear time series: nonparametric and parametric methods. Springer, New York
Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge
Gelman A, Carlin J, Stern H, Dunson D, Vehtari A, Rubin D (2013) Bayesian data analysis. Chapman & Hall CRC Texts in Statistical Science, New York
Ghosh JK, Delampady M, Samanta T (2006) An introduction to Bayesian analysis: theory and methods. Springer, New York
Harrison P, Stevens C (1976) Bayesian forecasting. J R Stat Soc 38(3):205–247
Hyndman RJ, Khandakar Y (2008) Automatic time series forecasting: the forecast package for R. J Stat Softw 88(3)
Launay T (2012) Bayesian methods for electricity load forecasting. PhD thesis, Université de Nantes
Launay T, Philippe A, Lamarche S (2012) Consistency of the posterior distribution and MLE for piecewise linear regression. Electron J Stat 6:1307–1357
Marin JM, Robert C (2014) Bayesian essentials with R, 2nd edn. Springer texts in statistics, Springer, New York
Menage JP, Panciatici P, Boury F (1988) Nouvelle modelisation de l’influence des conditions climatiques sur la consommation d’energie electrique. Tech. rep., EDF R&D
Minka TP (1999) Bayesian linear regression. Tech. rep., 3594 Security Ticket Control
Ramanathan R, Engle R, Granger C, Vahid-Araghi F, Brace C (1997) Short-run forecasts of electricity loads and peaks. Int J Forecast 13:161–174
Robert CP (2007) The Bayesian choice: from decision-theoretic foundations to computational implementation, 2nd edn. Springer, New York
Robert CP, Casella G (2009) Introducing Monte Carlo methods with R, 1st edn. Springer, New York
Seber GAF, Wild CJ (2003) Nonlinear regression. Wiley series in probability and statistics. Wiley-Interscience, New York
Smith M (2000) Modeling and short-term forecasting of new south wales electricity system load. J Bus Econ Stat 18:465–478
Soares L, Medeiros M (2008) Modeling and forecasting short-term electricity load: a comparison o methods with an application to Brazilian data. Int J Forecast 24:630–644
Taylor JW (2003) Short-term electricity demand forecasting using double seasonal exponential smoothing. J Oper Res Soc 54(8):799–805
Taylor JW, Buizza R (2003) Using weather ensemble predictions in electricity demand forecasting. Int J Forecast 19(1):57–70
Yang R, Berger J (1998) A catalog of noninformative priors. Institute of Statistics and Decision Sciences, Duke University, Tech. rep.
Acknowledgments
The authors would like to thank Adélaïde Priou for collecting a part of the data as well as the corresponding results, and Virginie Dordonnat for the insightful discussions.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Using the notation \(M_{i\bullet }\) for the \(i\)th row of a matrix \(M\), the non-linear model described in (6) can be re-written in the following condensed way: for \(t = 1,\ldots ,N,\)
The matrices \(A\) of size \(N\times d_A\), \(B\) of size \(N\times d_{\beta }\), \(C\) of size \(N\times 1\), and \(T\) of size \(N\times 1\) are known exogenous variables while the parameters of the model to be estimated are
where \(B_+^{d_{\beta }}(0, 1) = \{ \beta \in (\mathbb {R}_+)^{d_{\beta }} ;\; \Vert \beta \Vert _1 \le 1 \}\) is the positive quadrant of the \(\Vert \cdot \Vert _1\)-unit ball of dimension \(d_{\beta }\).
Proposition 1
For \((\beta , u) \in B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]\) denote \(A_*(\beta , u)\) the matrix whose rows are
and suppose \(A_*^\prime (b, u) A_*(b, u)\) has full rank for every \((\beta , u) \in B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]\). Assume furthermore that \(N>d_\alpha +1\) and that \((y_1,\ldots ,y_N)\) are observations coming from the model (7). The posterior measure corresponding to the informative prior designed in (4) is then a well-defined (proper) probability distribution.
Proof
First notice that \( \int {\pi (\theta , k, l, q, r | y)} \,\mathrm {d}\sigma ^{2} \) is proportional to
for almost every \(y\) and that the function \(\theta \mapsto \Vert y - f(\theta )\Vert _2^{-N}\) is bounded, for almost every \(y\). The posterior integrability is hence trivial as long as \(\pi (\theta |k, l)\pi (k | q, r)\pi (l)\pi (q)\pi (r)\) itself is a proper distribution which is the case here. \(\square \)
Proposition 2
Under the same assumptions as in Proposition 1, the posterior measure corresponding to the non-informative prior \(\pi (\theta , \sigma ^2) \propto \sigma ^{-2}\) is also a well-defined (proper) probability distribution.
Proof
Notice first that
and observe then that
Let \((\beta _0, u_0) \in B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]\) and denote \(\alpha _* = (\alpha , \gamma )\). We write
and thus obtain the following equivalence, as \((\beta , u)\rightarrow (\beta _0,u_0)\) and \(\Vert \alpha _*\Vert _2\rightarrow +\infty \)
The triangular inequality applied to the right-hand side of (8) gives
Since \(A_*^\prime (\beta _0,u_0) A_*(\beta _0,u_0)\) has full rank, by straightforward algebra we get
where \(\lambda \) is the smallest eigenvalue \((A_*(\beta _0,u_0))^\prime A_*(\beta _0,u_0)\) and is strictly positive. We can hence find an equivalent of the right-hand side of (9) as \(\Vert \alpha _*\Vert _2\rightarrow +\infty \), which is
Combining (8), (9) and (10) together, we see that the integrability of the left-hand side of (8) as \((\beta , u)\rightarrow (\beta _0,u_0)\) and \(\Vert \alpha _*\Vert _2\rightarrow +\infty \) is directly implied by that of \(\Vert \alpha _*\Vert _2^{-N}\). The latter is immediate for \(N > d_\alpha +1\), as can be seen via a Cartesian to hyperspherical re-parametrisation.
The previous paragraph thus ensures the integrability of \(\Vert y - f(\theta )\Vert _2^{-N}\) over sets of the form
where the subset \(V(b_0, u_0)\) is an open neighbourhood of \((\beta _0, u_0)\) and \(M(\beta _0, u_0)\) is a real number depending on \((\beta _0, u_0)\). By compactness of \(B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]\) there exists a finite union of such \(V(\beta _i, u_i)\) that covers \(B_+^{d_{\beta }}(0, 1) \times [\underline{u},\,\overline{u}]\). Denoting \(M\) the maximum of \(M(\beta _i, u_i)\) over the corresponding finite subset of \((\beta _i, u_i)\), we finally obtain the integrability of \(\Vert y - f(\theta )\Vert _2^{-N}\) over \(\{(\beta ,u)\in B_+^{d_{\beta }}(0, 1), \Vert \alpha _*\Vert \in ]M,\,+\infty [\}\).
The integrability of \(\Vert y - f(\theta )\Vert _2^{-N}\) over \(\{(\beta ,u)\in B_+^{d_{\beta }}(0, 1), \Vert \alpha _*\Vert \in [0,\,M]\}\) is trivial, recalling that \(\theta \mapsto \Vert y - f(\theta )\Vert _2\) is continuous and does not vanish over this compact for almost every \(y\), meaning that its inverse shares these same properties. \(\square \)
Remark 1
The condition “\(A_*^\prime A_*\) has full rank” mentioned above is typically verified in our applications for the regressors used in our model. To see this, call “vector of heating degrees” the vector whose coordinates are \((T_{t}-u)1\!\!1_{[T_{t},\,+\infty [}(u)\). Then not satisfying the aforementioned condition is equivalent to saying that there exists an index \(i\) and a threshold \(u\) such that the family of vectors formed by the regressors \(A\) and the vector of heating degrees is linearly dependent over the subset \(\Psi _i\) of the calendar”.
Rights and permissions
About this article
Cite this article
Launay, T., Philippe, A. & Lamarche, S. Construction of an informative hierarchical prior for a small sample with the help of historical data and application to electricity load forecasting. TEST 24, 361–385 (2015). https://doi.org/10.1007/s11749-014-0416-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-014-0416-0