Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter Open Access December 31, 2013

Prediction of time series by statistical learning: general losses and fast rates

  • Pierre Alquier EMAIL logo , Xiaoyin Li and Olivier Wintenberger
From the journal Dependence Modeling

Abstract

We establish rates of convergences in statistical learning for time series forecasting. Using the PAC-Bayesian approach, slow rates of convergence √ d/n for the Gibbs estimator under the absolute loss were given in a previous work [7], where n is the sample size and d the dimension of the set of predictors. Under the same weak dependence conditions, we extend this result to any convex Lipschitz loss function. We also identify a condition on the parameter space that ensures similar rates for the classical penalized ERM procedure. We apply this method for quantile forecasting of the French GDP. Under additional conditions on the loss functions (satisfied by the quadratic loss function) and for uniformly mixing processes, we prove that the Gibbs estimator actually achieves fast rates of convergence d/n. We discuss the optimality of these different rates pointing out references to lower bounds when they are available. In particular, these results bring a generalization the results of [29] on sparse regression estimation to some autoregression.

References

[1] A. Agarwal and J. C. Duchi, The generalization ability of online algorithms for dependent data, IEEE Trans. Inform. Theory 59 (2011), no. 1, 573–587. Search in Google Scholar

[2] H. Akaike, Information theory and an extension of the maximum likelihood principle, 2nd International Symposium on Information Theory (B. N. Petrov and F. Csaki, eds.), Budapest: Akademia Kiado, 1973, pp. 267–281. Search in Google Scholar

[3] P. Alquier and P. Lounici, PAC-Bayesian bounds for sparse regression estimation with exponential weights, Electron. J. Stat. 5 (2011), 127–145. Search in Google Scholar

[4] P. Alquier, PAC-Bayesian bounds for randomized empirical risk minimizers, Math. Methods Statist. 17 (2008), no. 4, 279–304. Search in Google Scholar

[5] K. B. Athreya and S. G. Pantula, Mixing properties of Harris chains and autoregressive processes, J. Appl. Probab. 23 (1986), no. 4, 880–892. MR 867185 (88c:60127) 10.2307/3214462Search in Google Scholar

[6] J.-Y. Audibert, Fast rates in statistical inference through aggregation, Ann. Statist. 35 (2007), no. 2, 1591–1646. Search in Google Scholar

[7] P. Alquier and O. Wintenberger, Model selection for weakly dependent time series forecasting, Bernoulli 18 (2012), no. 3, 883–193. Search in Google Scholar

[8] G. Biau, O. Biau, and L. Rouvière, Nonparametric forecasting of the manufacturing output growth with firm-level survey data, Journal of Business Cycle Measurement and Analysis 3 (2008), 317–332. 10.1787/jbcma-v2007-art15-enSearch in Google Scholar

[9] A. Belloni and V. Chernozhukov, L1-penalized quantile regression in high-dimensional sparse models, Ann. Statist. 39 (2011), no. 1, 82–130. Search in Google Scholar

[10] P. Brockwell and R. Davis, Time series: Theory and methods (2nd edition), Springer, 2009. Search in Google Scholar

[11] E. Britton, P. Fisher, and J. Whitley, The inflation report projections: Understanding the fan chart, Bank of England Quarterly Bulletin 38 (1998), no. 1, 30–37. Search in Google Scholar

[12] L. Birgé and P. Massart, Gaussian model selection, J. Eur. Math. Soc. 3 (2001), no. 3, 203–268. Search in Google Scholar

[13] G. Biau and B. Patra, Sequential quantile prediction of time series, IEEE Trans. Inform. Theory 57 (2011), 1664– 1674. 10.1109/TIT.2011.2104610Search in Google Scholar

[14] F. Bunea, A. B. Tsybakov, and M. H. Wegkamp, Aggregation for gaussian regression, Ann. Statist. 35 (2007), no. 4, 1674–1697. Search in Google Scholar

[15] O. Catoni, A PAC-Bayesian approach to adaptative classification, preprint (2003). Search in Google Scholar

[16] O. Catoni, Statistical learning theory and stochastic optimization, Springer Lecture Notes in Mathematics, 2004. 10.1007/b99352Search in Google Scholar

[17] O. Catoni, PAC-Bayesian supervised classification (the thermodynamics of statistical learning), Lecture Notes- Monograph Series, vol. 56, IMS, 2007. Search in Google Scholar

[18] N. Cesa-Bianchi and G. Lugosi, Prediction, learning, and games, Cambridge University Press, New York, 2006. 10.1017/CBO9780511546921Search in Google Scholar

[19] L. Clavel and C. Minodier, A monthly indicator of the french business climate, Documents de Travail de la DESE, 2009. Search in Google Scholar

[20] M. Cornec, Constructing a conditional gdp fan chart with an application to french business survey data, 30th CIRET Conference, New York, 2010. Search in Google Scholar

[21] N. V. Cuong, L. S. Tung Ho, and V. Dinh, Generalization and robustness of batched weighted average algorithm with v-geometrically ergodic markov data, Proceedings of ALT’13 (Jain S., R. Munos, F. Stephan, and T. Zeugmann, eds.), Springer, 2013, pp. 264–278. 10.1007/978-3-642-40935-6_19Search in Google Scholar

[22] J. C. Duchi, A. Agarwal, M. Johansson, and M. I. Jordan, Ergodic mirror descent, SIAM J. Optim. 22 (2012), no. 4, 1549–1578. Search in Google Scholar

[23] J. Dedecker, P. Doukhan, G. Lang, J. R. León, S. Louhichi, and C. Prieur, Weak dependence, examples and applications, Lecture Notes in Statistics, vol. 190, Springer-Verlag, Berlin, 2007. 10.1007/978-0-387-69952-3Search in Google Scholar

[24] M. Devilliers, Les enquêtes de conjoncture, Archives et Documents, no. 101, INSEE, 1984. Search in Google Scholar

[25] E. Dubois and E. Michaux, étalonnages à l’aide d’enquêtes de conjoncture: de nouvaux résultats, Économie et Prévision, no. 172, INSEE, 2006. 10.3917/ecop.172.0011Search in Google Scholar

[26] P. Doukhan, Mixing, Lecture Notes in Statistics, Springer, New York, 1994. 10.1007/978-1-4612-2642-0Search in Google Scholar

[27] K. Dowd, The inflation fan charts: An evaluation, Greek Economic Review 23 (2004), 99–111. Search in Google Scholar

[28] A. Dalalyan and J. Salmon, Sharp oracle inequalities for aggregation of affine estimators, Ann. Statist. 40 (2012), no. 4, 2327–2355. Search in Google Scholar

[29] A. Dalalyan and A. Tsybakov, Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity, Mach. Learn. 72 (2008), 39–61. 10.1007/s10994-008-5051-0Search in Google Scholar

[30] F. X. Diebold, A. S. Tay, and K. F. Wallis, Evaluating density forecasts of inflation: the survey of professional forecasters, Discussion Paper No.48, ESRC Macroeconomic Modelling Bureau, University of Warwick and Working Paper No.6228, National Bureau of Economic Research, Cambridge, Mass., 1997. Search in Google Scholar

[31] M. D. Donsker and S. S. Varadhan, Asymptotic evaluation of certain markov process expectations for large time. iii., Comm. Pure Appl. Math. 28 (1976), 389–461. 10.1002/cpa.3160290405Search in Google Scholar

[32] P. Doukhan and O. Wintenberger, Weakly dependent chain with infinite memory, Stochastic Process. Appl. 118 (2008), no. 11, 1997–2013. Search in Google Scholar

[33] R. F. Engle, Autoregressive conditional heteroscedasticity with estimates of variance of united kingdom inflation, Econometrica 50 (1982), 987–1008. 10.2307/1912773Search in Google Scholar

[34] C. Francq and J.-M. Zakoian, Garch models: Structure, statistical inference and financial applications, Wiley- Blackwell, 2010. 10.1002/9780470670057Search in Google Scholar

[35] S. Gerchinovitz, Sparsity regret bounds for individual sequences in online linear regression, Proceedings of COLT’11, 2011. Search in Google Scholar

[36] J. Hamilton, Time series analysis, Princeton University Press, 1994. 10.1515/9780691218632Search in Google Scholar

[37] H. Hang and I. Steinwart, Fast learning from α-mixing observations, Technical report, Fakultät für Mathematik und Physik, Universität Stuttgart, 2012. Search in Google Scholar

[38] I. A. Ibragimov, Some limit theorems for stationary processes, Theory Probab. Appl. 7 (1962), no. 4, 349–382. Search in Google Scholar

[39] A. B. Juditsky, A. V. Nazin, A. B. Tsybakov, and N. Vayatis, Recursive aggregation of estimators bythe mirror descent algorithm with averaging, Probl. Inf. Transm. 41 (2005), no. 4, 368–384. Search in Google Scholar

[40] A. B. Juditsky, P. Rigollet, and A. B. Tsybakov, Learning my mirror averaging, Ann. Statist. 36 (2008), no. 5, 2183–2206. Search in Google Scholar

[41] R. Koenker and G. Jr. Bassett, Regression quantiles, Econometrica 46 (1978), 33–50. 10.2307/1913643Search in Google Scholar

[42] R. Koenker, Quantile regression, Cambridge University Press, Cambridge, 2005. 10.1017/CBO9780511754098Search in Google Scholar

[43] S. Kullback, Information theory and statistics, Wiley, New York, 1959. Search in Google Scholar

[44] N. Littlestone and M.K. Warmuth, The weighted majority algorithm, Information and Computation 108 (1994), 212–261. 10.1006/inco.1994.1009Search in Google Scholar

[45] P. Massart, Concentration inequalities and model selection - ecole d’été de probabilités de saint-flour xxxiii - 2003, Lecture Notes in Mathematics - J. Picard Editor, vol. 1896, Springer, 2007. Search in Google Scholar

[46] D. A. McAllester, PAC-Bayesian model averaging, Procs. of of the 12th Annual Conf. On Computational Learning Theory, Santa Cruz, California (Electronic), ACM, New-York, 1999, pp. 164–170. 10.1145/307400.307435Search in Google Scholar

[47] R. Meir, Nonparametric time series prediction through adaptive model selection, Mach. Learn. 39 (2000), 5–34. 10.1023/A:1007602715810Search in Google Scholar

[48] C. Minodier, Avantages comparés des séries premières valeurs publiées et des séries des valeurs révisées, Documents de Travail de la DESE, 2010. Search in Google Scholar

[49] D. S. Modha and E. Masry, Memory-universal prediction of stationary random processes, IEEE Trans. Inform. Theory 44 (1998), no. 1, 117–133. Search in Google Scholar

[50] S. P. Meyn and R. L. Tweedie, Markov chains and stochastic stability, Communications and Control Engineering Series, Springer-Verlag London Ltd., London, 1993. MR 1287609 (95j:60103) 10.1007/978-1-4471-3267-7Search in Google Scholar

[51] A. Nemirovski, Topics in nonparametric statistics, Lectures on Probability Theory and Statistics - Ecole d’ét’e de probagilités de Saint-Flour XXVIII (P. Bernard, ed.), Springer, 2000, pp. 85–277. Search in Google Scholar

[52] R Development Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, 2008. Search in Google Scholar

[53] E. Rio, Ingalités de Hoeffding pour les fonctions lipschitziennes de suites dépendantes, C. R. Math. Acad. Sci. Paris 330 (2000), 905–908. 10.1016/S0764-4442(00)00290-1Search in Google Scholar

[54] P.-M. Samson, Concentration of measure inequalities for markov chains and φ-mixing processes, Ann. Probab. 28 (2000), no. 1, 416–461. Search in Google Scholar

[55] I. Steinwart and A. Christmann, Fast learning from non-i.i.d. observations, Advances in Neural Information Processing Systems 22 (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, eds.), 2009, pp. 1768–1776. Search in Google Scholar

[56] I. Steinwart, D. Hush, and C. Scovel, Learning from dependent observations, J. Multivariate Anal. 100 (2009), 175–194. 10.1016/j.jmva.2008.04.001Search in Google Scholar

[57] Y. Seldin, F. Laviolette, N. Cesa-Bianchi, J. Shawe-Taylor, J. Peters, and P. Auer, Pac-bayesian inequalities for martingales, IEEE Trans. Inform. Theory 58 (2012), no. 12, 7086–7093. Search in Google Scholar

[58] A. Sanchez-Perez, Time series prediction via aggregation : an oracle bound including numerical cost, Preprint arXiv:1311.4500, 2013. Search in Google Scholar

[59] G. Stoltz, Agrégation séquentielle de prédicteurs : méthodologie générale et applications à la prévision de la qualité de l’air et à celle de la consommation électrique, Journal de la SFDS 151 (2010), no. 2, 66–106. Search in Google Scholar

[60] J. Shawe-Taylor and R. Williamson, A PAC analysis of a bayes estimator, Proceedings of the Tenth Annual Conference on Computational Learning Theory, COLT’97, ACM, 1997, pp. 2–9. 10.1145/267460.267466Search in Google Scholar

[61] N. N. Taleb, Black swans and the domains of statistics, Amer. Statist. 61 (2007), no. 3, 198–200. Search in Google Scholar

[62] A. S. Tay and K. F. Wallis, Density forecasting: a survey, J. Forecast 19 (2000), 235–254. 10.1002/1099-131X(200007)19:4<235::AID-FOR772>3.0.CO;2-LSearch in Google Scholar

[63] V. Vapnik, The nature of statistical learning theory, Springer, 1999. 10.1007/978-1-4757-3264-1Search in Google Scholar

[64] V.G. Vovk, Aggregating strategies, Proceedings of the 3rd Annual Workshop on Computational Learning Theory (COLT), 1990, pp. 372–283. Search in Google Scholar

[65] O. Wintenberger, Deviation inequalities for sums of weakly dependent time series, Electron. Commun. Probab. 15 (2010), 489–503. Search in Google Scholar

[66] Y.-L. Xu and D.-R. Chen, Learning rate of regularized regression for exponentially strongly mixing sequence, J. Statist. Plann. Inference 138 (2008), 2180–2189. 10.1016/j.jspi.2007.09.003Search in Google Scholar

[67] B. Zou, L. Li, and Z. Xu, The generalization performance of erm algorithm with strongly mixing observations, Mach. Learn. 75 (2009), 275–295. 10.1007/s10994-009-5104-zSearch in Google Scholar

Received: 2013-10-23
Accepted: 2013-12-08
Published Online: 2013-12-31
Published in Print: 2013-01-01

©2013 Olivier Wintenberger et al.

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Downloaded on 27.4.2024 from https://www.degruyter.com/document/doi/10.2478/demo-2013-0004/html
Scroll to top button