An adaptive volatility method for probabilistic forecasting and its application to the M6 financial forecasting competition

In this paper, we address the problem of probabilistic forecasting using an adaptive volatility method rooted in classical time-varying volatility models and leveraging online stochastic optimization algorithms. These principles were successfully applied in the M6 forecasting competition under the team named AdaGaussMC. Our approach takes a unique path by embracing the Efficient Market Hypothesis (EMH) instead of trying to beat the market directly. We focus on evaluating the efficient market, emphasizing the importance of online forecasting in adapting to the dynamic nature of financial markets. The three key points of our approach are: (a) apply the univariate time-varying volatility model AdaVol, (b) obtain probabilistic forecasts of future returns, and (c) optimize the competition metrics using stochastic gradient-based algorithms. We contend that the simplicity of our approach contributes to its robustness and consistency. Remarkably, our performance in the M6 competition resulted in an overall 7th ranking, with a noteworthy 5th position in the forecasting task. This achievement, considering the perceived simplicity of our approach, underscores the efficacy of our adaptive volatility method in the realm of probabilistic forecasting.


Introduction
In financial time-series analysis, the practice of time-series forecasting stands as an indispensable element [4,5,15,16].The ability to accurately predict future time-series is essential for decision-makers in the complex landscape of financial markets.Building upon the legacy of the five preceding M competitions, each dedicated to advancing methods in time-series forecasting [18][19][20][21][22][23], the M6 competition emerges as a pivotal chapter in the exploration of forecasting methodologies [24].With a specific focus on scrutinizing the tenets of the Efficient Market Hypothesis (EMH), the M6 competition aims to provide valuable insights for researchers and practitioners interested in exploring the relationship between probabilistic forecasting and investment decision-making.Its primary objective is to bring fresh insights to the forefront of the EMH, a hypothesis positing that share prices encapsulate all relevant information.This notion implies that consistently outperforming the market is not feasible.
The M6 competition was composed of two parts: probabilistic forecasting and investment decision-making.The participants underwent a rigorous live evaluation process that occurred monthly for a duration of twelve months.In the forecasting part, the goal was to predict the rank probability of each financial asset, focusing on returns in the 1 st to 5 th quantile.The investment part involved deciding whether to invest or not based on the forecasted probabilities.These component added complexity, as participants had to not only decide on individual investments but also construct portfolios that matched their forecasts.The challenge extended to a diverse investment universe comprising 50 S&P500 stocks and 50 international Exchange Traded Funds (ETFs), spanning various asset categories and countries.
Instead of challenging the EMH, our approach was to embrace it, particularly under a non-stationary market paradigm.In the M6 competition, our strategy had a twofold focus: i) embracing the principles of an efficient market and ii) adapting to the dynamics of a non-stationary market paradigm.
For the EMH, we asked a fundamental question: What would an efficient market do?
Our methodology exclusively employed time-series methods, specifically focusing on the daily returns of assets and avoiding the inclusion of any external data.By estimating the future distribution of returns based on historical data, our goal was to evaluate the efficient market's probabilistic forecast.Concentrating on the principle that an efficient market implies that the expected return should be the same for a given level of risk, we directed our efforts towards modeling volatility.Forecasting volatility from historical data was key to assessing the efficient market's perception of the risk associated with each asset.This approach formed the foundation for our probabilistic forecast submission to the M6 competition.
To adapt to the challenges of a non-stationary market paradigm, we employed online learning [6].This methodology enabled us to dynamically adjust our model in response to evolving market conditions, ensuring the robustness of our forecasting methodology in the face of changing trends and uncertainties.The integration of efficient market principles with an adaptive response to market dynamics positioned our approach as a versatile and responsive method for probabilistic forecasting in financial time-series analysis.
While there are many ways to evaluate volatility, we advocate for online learning methods [6].This paradigm has been applied to various fields and performed particularly well in recent electricity load forecasting competitions [11,28].The strength of this adaptive approach is to take account for regime-changes in data, i.e., non-stationarity.Applied to volatility [29], it allows to account for temporary breaks in the data with periods of very high-volatility, such as the recent COVID crisis.Our hypothesis is that online learning procedures are necessary for forecasting future returns in an efficient market, given its non-stationary nature.
This paper provides an in-depth discussion of our online methodological approach, which secured a 5 th rank in the forecasting task and a 7 th rank in the M6 competition overall.Therefore, it serves as a comprehensive discussion paper outlining our approach and its results in the dual forecasting challenges.Notably, our strategy consistently surpassed the naive benchmark in probabilistic forecasting, showcasing the effectiveness of employing online volatility models, particularly AdaVol [29].We emphasize the frugality of our methods, asserting that simplicity contributes to the robustness and consistency observed in our results.
Organization.In Section 2 we introduce our approach to the M6 competition with simple methods beating the naive benchmark for the forecasting task.In Section 3, we present the adaptive volatility model, AdaVol.This is followed by the application of AdaVol to the M6 competition in Section 4. Section 5 contains a discussion of the performance of our methodology in the M6 competition.

Beating the M6 Forecasting Benchmark with Low Risk
The M6 financial forecasting competition aimed at assessing the EMH.To that end, it considered a financial universe composed of 100 assets: half stocks and half ETFs.The competition consisted in two tasks: probabilistic forecasting where the objective was to rank the future returns of the 100 assets in a probabilistic fashion, and investment decision-making where the competitors proposed portfolio allocations.Each task had a 4-week horizon, and there were 12 predictions to make, so the competition lasted roughly a year.
Our intuition is that it is extremely hard to beat the market.However, we remarked that the benchmark proposed by the competition organizers was not really following the market, and therefore we understood that there were simple ways to outperform this benchmark.
We focus on the probabilistic forecasting task.The objective was to rank the 100 assets in probability.More specifically, at each point the assets were put in five quantiles of 20 with respect to their 4-week returns.Participants were asked to assess the probability of each asset falling in each quantile, that is 500 (discrete) probabilities.The naive benchmark was uniform, that is 20% for each asset and each quantile.However, the EMH does not say that the distribution of the future return of each financial asset should be the same.The EMH rather proposes that the expected return of each financial asset for a given level of risk should be the same.In other words, some assets are more volatile than others, and for these risky assets a higher return is expected.
In the competition the universe was composed of 50 stocks and 50 ETFs.Generally, stocks are much more volatile than ETFs which are diversified.Therefore, attributing a probability higher than 20% for ETFs befalling in the middle quantile, and lower than 20% for these assets befalling in the extreme quantiles, seems natural.More precisely, we display the historical quintiles in Table 1.We classify the assets in five classes: Stocks, ETF Equities, ETF Fixed income, ETF Commodities and ETF Volatility.Then, we compute the frequency of each quintile for each class.For instance, during the considered period, stocks appeared in the extreme quintiles 28.7% and 23.7% of the time, while only 14.4% of the instances in the middle one.
We propose two very simple benchmarks to motivate our approach, and the gain obtained with respect to the competition one provides an explanation on our performance.
As presented above, the probabilistic forecasting task at time t consists in submitting a matrix of 100 × 5 entries, that we denote by M t .The loss function used by the competition is the Ranked Probability Score (RPS) between the submission and the true value, that we denote by Q t .Our first observation is that the historical values of Q t are not uniform in [0, 1]; see Table 1.Therefore, a very simple idea is to use the best constant matrix M with respect to the historical RPS.For some period T ∈ N, we compute We observe that M is equivalently defined by Indeed, the RPS is a quadratic loss between linear transforms of M and Q t .Therefore, a simple benchmark, named best constant, consists in defining a training set T (for instance, years 2015 to 2020) and submitting constantly M T .
This very simple method is purely based on the competition metrics.However, our fundamental goal is probabilistic timeseries forecasting.We aim to forecast the log-return of each asset in the 4-week horizon (a matrix R t ∈ R 20×100 for any instant t).
Our first approach is to predict the marginal distributions of each component and assume their independence.We fit Gaussian distributions for each marginal on a training set T : the daily log-return of asset a is predicted with r t ∼ N( μa , σ2 a ).Relying on the EMH, we don't claim to be able to predict μa better than the market does.We assume that for a given risk the expected return should be the same.We do the approximation that for a class of financial asset the risk is the same, and therefore the expected returns are similar.We set four classes: stocks, ETF equities, ETF fixed income, ETF commodities; therefore, μa is the average return of all assets of the corresponding class during years 2015 to 2020.
This yields a probabilistic forecast R t ∼ P t .Based on that forecast, we minimize the expected RPS as follows: with Q t defined by the ranks of R t .The RPS is convex, therefore we minimize it simply with a stochastic gradient descent of annealing step size, under Monte-Carlo simulation of R t , see Section 4.2.
We evaluate on the last 9 months of 2021; that is the data available before the start of the competition and without missing value for the assets.The simple best constant benchmark achieves a RPS of 0.1570, while the naive M6 benchmark has a RPS of 0.16.Then, our optimization of the expected RPS based on a Gaussian distribution yields a very close RPS of 0.1571.Finally, we observe that the log-returns of the final asset (volatility ETF) are far from Gaussian (c.f. Figure 1); more importantly, these log-returns are strongly asymmetric, which makes it suboptimal to use a symmetric distribution.Therefore, we define a hybrid forecast keeping our Gaussian distribution for the first 99 assets and drawing the last uniformly from its past values (2015 to 2020).This achieves a RPS of 0.1567.The difference between these benchmarks seems very small; however, it is significant as one should note that the differences of performance between top teams in the competition were similar.There is a bigger gap between the competition naive benchmark and best constant, than between best constant and the top-performing team.
From these simple benchmarks, we draw two conclusions: First, while it is hard to beat the market, it is easy to beat the uniform benchmark because the different assets are associated to different risks; assets with high volatility (stocks) are more likely to fall in the extreme quantiles than assets with low volatility (ETFs).Second, excluding the last asset (volatility ETF), it During that period, this asset falls essentially in extreme quintiles (62.5% in the worse, 28.7% in the best), see Table 1.
is possible to model each asset's distribution with a Gaussian, where we don't focus on forecasting the mean but on the variance; indeed, we apply a simple assumption on the mean (assets of a same class have the same expected return), but we aim to capture the different behaviors with volatility forecasting.Our Gaussian forecasting model seems very naive.Nonetheless, its advantage is to yield a framework on which we can apply more sophisticated methods.Indeed, we rely on online learning to estimate the volatility.The objective is to capture the evolution of each asset's volatility to enhance our probabilistic forecast.That is presented in the next section.

AdaVol: an Adaptive Volatility Method
The intricate nature of financial time-series reveals a dynamic characteristic in volatility, characterized by its time-varying nature and frequent clustering phenomena.The quest to model and predict this volatility has led to the exploration of various methodologies, with non-linear time-series models often taking center stage.Among these, the AutoRegressive Conditional Heteroskedasticity (ARCH) model and the Generalized ARCH (GARCH) model are the most well-known [2,8].
However, the GARCH model hinges on the assumption of stationarity, a premise that might be subject to scrutiny in real-world financial data.The inherent non-stationarity of financial timeseries data prompts the exploration of alternative approaches, with a natural inclination towards adaptive methods for robust volatility modulation.To solve this, we consider AdaVol [29], an innovative online volatility method designed to navigate the challenges posed by time-varying volatility in financial data.
AdaVol departs from traditional stationary assumptions and embraces adaptability as a cornerstone for modeling volatility dynamics.By leveraging the principles of online learning, AdaVol addresses the limitations of GARCH and offers a flexible framework to capture the nuanced evolution of volatility over time.This departure from stationarity assumptions aligns AdaVol with the inherent characteristics of financial time-series data, where volatility is known to evolve dynamically.Specifically, AdaVol's adaptability during regime-changes, as evidenced in Werge and Wintenberger [29,Figure 8], and its capacity to react to major events like the COVID-19 pandemic, as illustrated in Werge and Wintenberger [29,Figure 9], highlight its capability to address the complexities of time-varying volatility in financial data.
In its simplest form, AdaVol is a GARCH-like model, where the statistical inference is carried out using the Quasi-Maximum Likelihood (QML) procedure, which is recursively updated using stochastic gradient-based algorithms [3].This methodology enables AdaVol to recursively update its estimates based on incoming data, ensuring a responsive and dynamic adaptation to changing volatility patterns.Unlike GARCH, AdaVol is more well-suited for the inherent non-stationarity observed in financial time-series.
The usual approach for estimating parameters θ = (α 1 , . . ., α p , β 1 , . . ., β q ) ⊤ ∈ R p+q + is by the QML estimator [1,10,27].Here, the goal is to minimize the Quasi-Likelihood function L n (θ) defined by Remark, these parameters θ are used in the volatility process σ 2 t (θ) to make volatility forecast.Commonly, iterative estimation procedures are used for the minimization of L n (θ), e.g., quasi-Newton methods [25].Roughly, each iteration will have a computational cost of O(n(p + q)), making the minimization cost O(nm(p + q)), where Algorithm 1: AdaVol [29] Input: m is the number of iterations.As new data arrive, this becomes prohibitively expensive and increasingly computationally inefficient.Furthermore, iterative optimization tools are unsuitable for financial data, as data often arrives in large quantities and with high frequency.Stochastic optimization procedures are undoubtedly advantageous since observations are processed one-by-one [3]; this is very scalable as the cost is only O(p + q) computations for the minimization (compared to O(nm(p + q))).In online QML estimation, the parameter estimate is updated exclusively based on the previous estimate and the new observation.Thus, computationally efficient, as each new observation only need to processed once.
For AdaVol, our optimization strategy leverages first-order stochastic optimization methods, employing AdaGrad as the learning rate [7].To ensure adherence to the parameter space constraints, we augment this approach with a projected version.This combination not only enhances the convergence speed of the optimization process but also guarantees that the estimated parameters remain within the valid parameter space, reinforcing the stability and reliability of AdaVol's volatility forecasts.Specifically, AdaVol minimizes L n (θ) by θ n , which is derived from the recursion: where η > 0 is a constant learning rate, ϵ > 0 a small number ensuring positivity of the denominator, and P Θ is the projection onto .
Algorithm 1 describes AdaVol in more detail.Note ∇ θ l i (θ i−1 ) 2 denotes the element-wise square ∇ θ l i (θ i−1 ) ⊙ ∇ θ l i (θ i−1 ).Additionally, a practical implementation of AdaVol can be explored on GitHub. 1daVol's architecture has demonstrated its efficacy in generating robust and adaptable forecasts.Its capability to adjust to time-varying parameters proves advantageous in scenarios characterized by non-stationarity.Additionally, AdaVol stands out for its computational and memory efficiency, leveraging only the preceding (GARCH) estimate to process new observations.This streamlined approach ensures a single pass through the observations, minimizing computational overhead.
In Werge and Wintenberger [29, Appendix B], the authors conducted a relative computational speed comparison, demonstrating that AdaVol is approximately 205 times faster than the GARCH(1, 1) model for a sample size of n = 1000 [29, Table B.4].Furthermore, they observed that the relative speed gain of AdaVol improves with larger sample sizes.
It is noteworthy that financial data commonly arrives in timevarying mini-batches.A straightforward extension of AdaVol to this dynamic setting results in a computational cost of just O(b t (p + q)), where b t denotes the number of observations arriving at time t.This approach aligns with the frequent occurrence of time-varying mini-batches in financial data.Simultaneously, the adoption of time-varying mini-batches has been substantiated to enhance the estimation procedure [13,14].

Back to M6 Financial Forecasting Competition
As presented in Section 2, the M6 financial forecasting competition aimed at assessing the EMH.It consisted in two tasks: probabilistic forecasting and investment decision-making.
We did not use specific knowledge on finance; our strategy was based solely on probabilistic time-series forecasting and stochastic optimization.Indeed, each task was evaluated by a metrics and our objective was to optimize it.
We developed a strategy in two steps.First, as our aim is probabilistic time-series forecasting, we obtain such probabilistic forecasts using AdaVol.Then, our second step consists in optimizing the expected loss function with respect to the submission based on these probabilistic forecasts.

Probabilistic forecasting based on AdaVol
Our objective is to forecast the return of each asset in the 4-week horizon (a matrix R t ∈ R 20×100 ).We first predict the marginal distributions of each component; then we reconcile them either with an independent assumption or after the estimation of correlations.
The prediction of the marginals fits in the setting of univariate time-series forecasting.We denote by r t,a the log-return of asset a at time t.We apply AdaVol on (r t,a − μa ) t assumed independent, where μa is the estimated mean return per class defined in Section 2. The Gaussian application of AdaVol yields r t,a ∼ N( μa , σ2 t,a ).At submission point i of time t i , we have a volatility σt i ,a , and our model becomes fixed: We treat separately the final asset (volatility): we simply use the empirical distribution of its past returns.Finally, we combine the marginals to obtain a distribution on R t i .We compared different approach on the year of data preceding the competition.The best results were not the same for the two tasks.For probabilistic forecasting, we simply use independent assets, and our joint distribution is the product of marginals; for investment decision-making, we estimated correlations between asset returns (r a,a ′ ) and our joint distribution was a multivariate Gaussian distribution of covariance matrix

Optimization of the expected loss function
For each task, our submission is a vector x t i ∈ R p ; we have a loss function (the negative information ratio is minimized), denoted by ℓ, that depends on our prediction and the return matrix R t i : the evaluation is ℓ(x t i , R t i ).
The evaluation is the average of the loss on 12 iterations.As we have a probabilistic prediction of R i , it is natural to minimize the expected loss obtained under that distribution.If our distribution was correct this would be optimal for a very large number of submission points.Our objective is the following: Our optimization procedure relies on ADAM [17].Each optimization step relies on a mini-batch of 100 samples of P t i , and the gradient step used is where k is the iteration number.
The convex nature of the RPS yields convergence of this procedure to the optimal point for probabilistic forecasting, under the assumption that our distribution P t i is correct.The negative information ratio is not convex and there is no guarantee of convergence to the optimal point for investment decision-making.As a sanity check, we ensure that the attained point x t i yields a better expected information ratio than the naive uniform portfolio allocation.During the competition we observed that our obtained information ratio was slightly above the one of the uniform benchmark, confirming this property, but not significantly better.

Discussion
Our objective was on the probabilistic forecasting part of the M6 competition; here, we were ranked 5 th out of 163 competitors.Note that only 38 participants outperformed a naive benchmark designed by the organizers (M6 dummy).This achievement underscores the potential of our online approach to contribute significantly to understanding market dynamics and intricacies in financial time-series analysis.
In contrast, our performance in the investment decision task was not as prominent, securing a 42 nd position compared to the benchmark's 48 th position.
Overall, we ranked 7 th in the M6 competition.It has been established by the previous M competitions that statistically sophisticated methods do not necessarily outperform simple methods [20].We claim that the simple complexity of our methodology explains its robustness and this good performance.

Interpretation of our Results
These observations prompts three key interpretations that shed light on our strategy's behavior in the two tasks.
Market understanding.The first interpretation is philosophical.As stated in the introduction, we did not try to contradict the EMH.Instead, we embraced it, and asked ourselves what would the probabilistic forecast of the efficient market be.Our in-depth analysis of the RPS in Section 2 explains how the uniform benchmark can be outperformed using probabilistic forecast of asset returns.Therefore, it is natural to obtain good performances in the probabilistic forecasting task.We did not conduct such study of the information ratio, and we conjecture that those who outperformed us possess a deeper understanding of the market.This acknowledgment emphasizes the complexity of financial markets and encourages continuous exploration and refinement of our approach to align with the dynamic and nuanced nature of market behavior.
Univariate vs. multivariate forecasting.A second explanation lies in the distribution representation.Our strategy rely on univariate time-series forecasting from AdaVol, emphasizing correlations for the investment decision task while employing the product of marginals from the probabilistic forecasting task.The performance metric, Ranked Probability Score (RPS), favors an individualized approach as it is the sum of each asset's RPS.Correctly predicting the outcome for one asset minimally impacts the RPS of another.Hence, it would be possible to forecast independently the relative performance of each asset.
However, for investment decisions, the information ratio involves a ratio of two quantities where the numerator is the return, potentially decomposable into independent returns.Yet, the denominator (representing the standard deviation of daily returns) adds a non-linearity and necessitates modeling multivariate returns directly.Future work may explore a multivariate version of AdaVol to enhance the adaptability of our strategy in the investment decision task.
Optimization challenges.Our last interpretation is technical.It stems from the convex nature of the RPS metric in contrast to the non-convex nature of the negative information ratio.Our optimization procedure is a gradient descent; it easily identifies optimal points for the probabilistic forecasting task, while this is not guaranteed for the investment decisions task.This observation highlights a potential avenue for further refinement in our optimization approach to address the unique challenges posed by the investment decision task.In particular, an in-depth analysis of the information ratio would certainly have improved our results; an illustration is that it is better to rescale the allocation to be as small as possible (summing to 0.25 in the competition) [26, Section 2.2.2].

Future work
Future work should delve into refining our strategy by exploring a multivariate version of AdaVol, offering a more comprehensive modeling approach for investment decisions.Indeed, incorporating correlations into AdaVol could improve stability and robustness towards ill-conditioned settings [12].Additionally, optimization procedures tailored for non-convex metrics could provide valuable insights, potentially unlocking further potential in addressing the intricacies of the investment decision task.Furthermore, running parallel versions of AdaVol with different risk-appetite combined with expert aggregation could increasing robustness [30]; indeed, it should be noted that expert aggregation has provided very competitive results in various competitions [11,28].
At last, Bayesian algorithm should be tested to adapt the GARCH coefficients in the same setting as AdaVol; it has been shown that state-space models yield a good representation to adapt machine learning models [28].We believe this framework could be applied to the adaptation of GARCH.
In conclusion, the success in the forecasting task, coupled with the identified challenges in investment decisions, motivates us to continue refining and expanding our approach.Continuous exploration and adaptation will be crucial in unlocking the full potential of our adaptive volatility method in the realm of online probabilistic forecasting and investment decision-making.

Figure 1 :
Figure 1: Histogram of VXX log-return during years 2015 to 2020.During that period, this asset falls essentially in extreme quintiles (62.5% in the worse, 28.7% in the best), see Table1.

Table 1 :
Frequency of each quintile for each class during years 2015 to 2020.Quintile Q1 is composed of the 20 assets performing the best, while Q5 is composed of the 20 assets performing the worse.

Table 2 :
Summary of the specificities of each task.