Skip to main content
Advertisement

< Back to Article

Confidence resets reveal hierarchical adaptive learning in humans

Fig 1

Apparent learning rate modulations in previous designs are not a hallmark of hierarchical processing.

This simulation is inspired by a previous study by Behrens et al [2], in which the reward probability was not fixed but changed abruptly; the authors used different volatility levels (i.e. different numbers of change points). Similarly, we generated sequences with low volatility (7 change points, see vertical plain black lines), and high volatility (see additional change points, vertical dashed dashed lines). The sequences were binary (absence or presence of reward) and the reward probability was resampled randomly after each change point. We consider two learning models: a hierarchical model, which estimates the reward rate, taking into account the possibility of change points; and a flat model that computes the reward rate near-optimally based on a fixed leaky count of observations, and a prior count of 1 for either outcome (see Methods). Each model has a single free parameter (respectively, a priori volatility and leak factor) which we fit to return the best estimate of the actual generative reward probabilities in both the low and high volatility conditions together. Keeping those best fitting parameters equal across both conditions, we measured the dynamic of the apparent learning rates of the models, defined as the ratio between the current update of the reward estimate (θt+1t) and the prediction error leading to this update (yt+1t). The hierarchical model shows a transient increase in its apparent learning rate whenever a change point occurs, reflecting that it gives more weight to the observations that follow a change point. Such a dynamic adjustment of the apparent learning rate was reported in humans [5]. The flat model showed a qualitatively similar effect, despite the leakiness of its count being fixed. Note that since there are more change points in the higher volatility condition (dashed lines), the average learning rates of both models also increase overall with volatility, as previously reported in humans [2]. The lines show mean values across 1000 simulations; s.e.m. was about the line thickness and therefore omitted.

Fig 1

doi: https://doi.org/10.1371/journal.pcbi.1006972.g001