Joint modeling of choices and reaction times based on Bayesian contextual behavioral control

In cognitive neuroscience and psychology, reaction times are an important behavioral measure. However, in instrumental learning and goal-directed decision making experiments, findings often rely only on choice probabilities from a value-based model, instead of reaction times. Recent advancements have shown that it is possible to connect value-based decision models with reaction time models. However, typically these models do not provide an integrated account of both value-based choices and reaction times, but simply link two types of models. Here, we propose a novel integrative joint model of both choices and reaction times by combining a computational account of Bayesian sequential decision making with a sampling procedure. This allows us to describe how internal uncertainty in the planning process shapes reaction time distributions. Specifically, we use a recent context-specific Bayesian forward planning model which we extend by a Markov chain Monte Carlo (MCMC) sampler to obtain both choices and reaction times. As we will show this makes the sampler an integral part of the decision making process and enables us to reproduce, using simulations, well-known experimental findings in value based-decision making as well as classical inhibition and switching tasks. Specifically, we use the proposed model to explain both choice behavior and reaction times in instrumental learning and automatized behavior, in the Eriksen flanker task and in task switching. These findings show that the proposed joint behavioral model may describe common underlying processes in these different decision making paradigms.


Supplementary file: Relation to DDM
The drift diffusion model (DDM) can be interpreted as an approximation to a sequential probability ratio test [1] which links the DDM to probabilistic processing.Indeed, an equivalence between Bayesian updating and the drift diffusion model can be established for perceptual decision making [2,3].Bitzer et al. showed how the quantities in a DDM, e.g.drift rate, bias and boundary, relate to sequential updating of a Bayesian posterior and vice versa [2].Here, we want to show that a similar equivalence cannot be retrieved for the DDM and the BCC.In short, sequential posterior updates where the posterior becomes the new prior are required to establish the link to the sequential probability ratio test in the DDM.In the BCC however, sampling is used to update or learn Bayesian hyper-parameters, which is a fundamentally different underlying process which leads to a different type of update equation in each step.In what follows we show this in more detail.
In the perceptual decision making model in [2], the authors used a recursive Bayesian update rule to model decision making in a perceptual decision making experiment where p x t |a is a Gaussian representing the generative process, and p a|x 1:t−1 is the previous posterior.The evidence accumulation process starts at t = 0 with a prior p (a) = p 0 as the previous posterior.The above update is repeated for each observation x t which updates beliefs about which alternative a explains the observations best, which in turn translates to the currently best action.The authors translated this into an additive update rule for the log odds as and were able to show that this can be mapped to the DDM update rule where ∆t is the time in between time steps, v is the classical drift rate in the DDM, s is the diffusion constant, and t is the noise term.Using the form equivalence of the two equations, the authors show how exactly these quantities relate to the parameters of the Bayesian model, i.e. means and variances of the Gaussians.
In terms of the MCMC sampling in the BCC, the update rules (Eqs 9-11) have a different form, as they do not equate to a sequential probability ratio test.The sampling process was defined as In this process, a policy is sampled with a probability according to the prior p (π * ) and chosen into the chain with probability according to the likelihood p R|π * .Therefore, it can be shown that the probability of a sampled policy π * being accepted into the chain is proportional to the prior times likelihood, which means that on average (or for larger n) the updates of the Dirichlet parameters can be expressed as: which yields the following update equation for the estimated posterior over policies If this process would correspond to a sequential probability ratio test, the update in the BCC should be form equivalent to the update in the Bayesian model in [2].However, the update in [2]is additive in log space (as it is multiplicative in probability space due to its sequential nature), whereas the update in the BCC is additive in probability space.Hence, the two represent different types of processes and the BCC can not be mapped to the DDM in the same way.
Indeed, also a visual comparison of our Figure 4 and Figure 5a in [2] shows a key difference: In the BCC, the probability of the posterior approaches its true value, but oscillates around it, with decreasing amplitude as the process gets closer to the stopping criterion.In the model of [2], the posterior approaches the boundary from below but never crosses it.
However, we expect that that certain properties are shared, like the influence of the certainty in the prior over policies or the strength of the evidence in the likelihood.