Sequentially valid tests for forecast calibration

Sebastian Arnold; Alexander Henzi; Johanna F. Ziegel

doi:10.1214/22-AOAS1697

Abstract

Forecasting and forecast evaluation are inherently sequential tasks. Predictions are often issued on a regular basis, such as every hour, day, or month, and their quality is monitored continuously. However, the classical statistical tools for forecast evaluation are static, in the sense that statistical tests for forecast calibration are only valid if the evaluation period is fixed in advance. Recently, e-values have been introduced as a new, dynamic method for assessing statistical significance. An e-value is a nonnegative random variable with expected value, at most, one under a null hypothesis. Large e-values give evidence against the null hypothesis, and the multiplicative inverse of an e-value is a conservative p-value. Since they naturally lead to statistical tests which are valid under optional stopping, e-values are particularly suitable for sequential forecast evaluation. This article proposes e-values for testing probabilistic calibration of forecasts which is one of the most important notions of calibration. The proposed methods are also more generally applicable for sequential goodness-of-fit testing. We demonstrate in a simulation study that the e-values are competitive in terms of power, when compared to extant methods which do not allow for sequential testing. In this context we introduce test power heat matrices, a graphical tool to compactly visualize results of simulation studies on test power. In a case study we show that the e-values provide important and new useful insights in the evaluation of probabilistic weather forecasts.

Funding Statement

This work was supported by the Swiss National Science Foundation.

Acknowledgments

The authors are grateful to Sebastian Lerch for providing data for the case study and thank Timo Dimitriadis, Tilmann Gneiting, and the members of his group for valuable discussions and inputs. Valuable comments by Aaditya Ramdas and an anonymous reviewer helped us to improve this article. This work was supported by the Swiss National Science Foundation. Computations have been performed on UBELIX (https://ubelix.unibe.ch/), the HPC cluster of the University of Bern.

Citation

Download Citation

Sebastian Arnold. Alexander Henzi. Johanna F. Ziegel. "Sequentially valid tests for forecast calibration." Ann. Appl. Stat. 17 (3) 1909 - 1935, September 2023. https://doi.org/10.1214/22-AOAS1697

Information

Received: 1 November 2021; Revised: 1 July 2022; Published: September 2023

First available in Project Euclid: 7 September 2023

MathSciNet: MR4637650

Digital Object Identifier: 10.1214/22-AOAS1697

Keywords: e-value , Forecast calibration , probabilistic calibration , rank histogram , sequential inference

Abstract

Funding Statement

Acknowledgments

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS