Neural implementations of Bayesian inference

Bayesian inference has emerged as a general framework that captures how organisms make decisions under uncertainty. Recent experimental findings reveal disparate mechanisms for how the brain generates behaviors predicted by normative Bayesian theories. Here, we identify two broad classes of neural implementations for Bayesian inference: a modular class, where each probabilistic component of Bayesian computation is independently encoded and a transform class, where uncertain measurements are converted to Bayesian estimates through latent processes. Many recent experimental neuroscience findings studying probabilistic inference broadly fall into these classes. We identify potential avenues for synthesis across these two classes and the disparities that, at present, cannot be reconciled. We conclude that to distinguish among implementation hypotheses for Bayesian inference, we require greater engagement among theoretical and experimental neuroscientists in an effort that spans different scales of analysis, circuits, tasks, and species.


Introduction
Behavioral repertoires tend to be rife with feats of precision even though our observations of the environment can be ambiguous and subject to various sources of uncertainty. To mitigate such uncertainty, an organism may exploit several sources of information available to it. Normative theories enable us to better understand this process by quantifying the relative influence of sources of information. For instance, we can formalize previous experience with relevant variables as a prior distribution and can model uncertainty in our measurements of these variables as a likelihood function (For an in-depth review, see ref [1]). Having defined these sources of information, Bayesian estimation theory [1e5] prescribes how these two sources of information, priors and likelihoods, could be combined to generate a posterior distribution 1 and thereafter enables the derivation of an estimate that minimizes some performance metric (e.g. expected error) over the posterior. Bayesian theoretical frameworks, therefore, provide a formal avenue for studying whether an organism's behavior adheres to rational inference principles as it makes choices under uncertainty [2,5e8].
The value of the Bayesian perspective lies in its ability to confer function on seemingly inaccurate or biased behaviors [9]. For instance, several studies have demonstrated the consistency of time perception with Bayesian models, especially in tasks where humans or monkeys must reproduce a previously observed time interval [10e12] or aim to continue a rhythmic beat [13,14]. When stimulus durations in these tasks are sampled from a prior distribution, responses become biased toward the mean of the distribution. Bayesian models provide us with the insight that such seemingly biased behavior could, in fact, represent an optimal strategy that mitigates uncertainty in measurements by leveraging prior knowledge to minimize overall error. Along these lines, Bayesian theory has helped in the characterization of numerous behaviors in different species in the sensory [4,15e 21], motor [22e26], cognitive [9,27], and temporal domains [10e14,28e31]. Notwithstanding skepticism about the suitability of Bayesian models for certain behaviors [32e34], recent developments in modeling have expanded the considerations by which we deem behaviors optimal or suboptimal, for example, complex likelihoods, priors, and policies [35e37], bounded computational power to perform inference [38e41], and limits of learning strategies [42e45]). 1 Derived from the latin phrase a posteriori, which Immanuel Kant defined as knowledge that comes after empirical evidence has been considered. In the Bayesian case, a posterior is subjective knowledge that comes after both the a priori knowledge (or prior) and observations, in the form of likelihoods, have been considered. We sometimes refer to these subjective probabilities as belief distributions.
Concomitant with the development of Bayesian models that describe behavior, several theories emerged to explain implementation mechanisms underlying probabilistic computations in neurobiology, which span synapses [46e48], circuits [13,49], and populations [50e 54]. Here, we focus on aspects of these theories that pertain to similarly structured generative processes (Figure 1), which model relationships between the various elements used to carry out Bayesian inference. On the empirical front, recent years have witnessed a spate of findings [14,19,21,26,28,55,56] that investigated the neural basis of behaviors consistent with normative probabilistic theories. Moreover, other experimental studies have recently uncovered neural representations of individual components that could underlie Bayesian computations [56e59]. These findings relate to disparate behaviors in different species, which are likely to engage various brain regions. In other words, not only are we confronted with myriad empirical insights that require synthesis, we must also attempt to interpret these under various computational models. To address this many-to-many mapping conundrum, we recommend a conceptual categorization that seeks to simplify the interpretation of empirical findings under existing proposals of neural instantiations of Bayesian inference.
We propose two broad classes under which current implementation theories can be understood. Under the first class, which we refer to as the modular perspective, the likelihood and prior are encoded as independent entities and the mechanism entails a combination of these components to enable full Bayesian inference [1] ( Figure 1b). The other case, which we call the transform perspective, specifies how uncertain sensory measurements can be directly mapped into Bayesian estimates via latent processes within which prior distributions are embedded. This process does not mandate encoding of Two perspectives on neural implementation of Bayesian inference. (a) Generative model. Real-world stimuli, received by the sensory apparatus, are corrupted by various sources of internal and external uncertainty to yield a noisy measurement. In this scheme, the brain estimates the stimulus based on the measurement and prior knowledge of the real-world variable to elicit a response. Neural implementation of this inference process, f(m), can be understood in terms of two possible mechanisms. (b) Modular view. Top. The likelihood (orange), p(m|s), and prior distributions (blue), p(s), are independently encoded and combined to generate a posterior distribution (green), p(s|m). An estimate (e) is derived from the posterior using an appropriate cost function. Bottom: This modularity can be thought of in terms of a feedforward network structure where neural firing rates representing belief distributions are encoded independently and sequentially summed toward an output. For example, the rate at an earlier time, r(t = 0), would represent prior knowledge and be combined with the rate after a sensory measurement (likelihood) is made, r(t = 1), then subsequent activity could reflect the posterior (t > 1). (c) Transform view. Top. The Bayesian estimate (e) is obtained by transforming an uncertain measurement through a function f transform (.). This function, which represents a continuous estimate of the posterior (green), carries within it a representation of the likelihood (orange) and prior (blue). Bottom: The fundamentals of this computation are captured well by a recurrent network/circuit model. Measurements arrive as inputs while the encoding of prior and likelihood is distributed in circuit connectivities and dynamics. An estimate of the posterior (e) can be obtained through an appropriate readout. probabilistic distributions on each trial, which may not be necessary for certain tasks and can be resourceintensive for circuit implementation [60]. This implementation-based dichotomy might aid in interpreting existing insights while potentially refining empirical hypotheses of Bayesian theory for different circuits, species, and tasks.

Bayesian modular perspective
In this class of neural implementation, probabilistic computations are carried out using independent representations of likelihood, prior, and posterior distributions, followed by the generation of an estimate 2 ( Figure 2a). Two prominent frameworks address pathways toward such an implementation. The first, probabilistic population coding (PPC) [61e63] formalizes how optimal inference ensues from population coding of belief distributions, which when linearly combined provide a close approximation to the implementation of the Bayes rule. 3 Important features of PPC that are consistent with the modular view are independent representations of likelihood functions [56] and prior knowledge 4 in population codes, thus supporting flexible updates of trial-by-trial uncertainty.
Three recent studies in nonhuman primates reported findings consistent with the predictions of PPC in different circuits and behaviors, providing support for a modular implementation. The first of these studied smooth pursuit behaviors, where monkeys followed a moving target, whose speed was drawn from different prior distributions and whose reliability was modulated by changing contrasts [26]. The monkeys' pursuit behavior was consistent with the predictions of Bayesian inference, and the authors discovered that preparatory activity in the frontal eye fields encoded information about the prior distribution ( Figure 2b). Moreover, the neural activity in the same area, during the pursuit behavior, encoded an estimate of target speed that was consistent with a Bayesian estimator. This echoes the  . 4 The logarithm of the prior, in many cases.
tenets of the modular view by demonstrating independent observability of prior knowledge before its combination with sensory measurement. Furthermore, authors showed how a linear PPC model supports their findings ( Figure 2b). Similar findings pertaining to prior knowledge being reflected in preparatory activity have been reported in different species, areas, and tasks [59,64], which raises possibilities about the generality of such modular computations. The second study [55] consistent with predictions of PPC examined activity of neurons in the lateral intraparietal (LIP) cortex of monkeys performing a multisensory decision-making task. 5 Here, monkeys were trained to judge perceived heading based on direction of vestibular (inertial cues) and visual (optic flow) stimuli. Lateral intraparietal neurons integrated visual stimulus velocity and vestibular acceleration over time in a manner consistent with linear PPC models, which as shown previously, implements a Bayes optimal solution [63,65]. These findings echo similar results obtained in earlier work [66] and argue for independent representations of likelihoods in different areas.
The third recent study [56] provides empirical evidence about the independent probabilistic encoding of likelihoods ( Figure 2c). Monkeys were trained to perform a visual orientation discrimination task that required trialby-trial use of stimulus uncertainty, thereby providing an incentive to retain a full representation of the likelihood function. Using a Bayesian decision model and machine learning techniques, the authors showed that the full likelihood function can be decoded from population activities in the primary visual cortex. Furthermore, the full likelihood model better predicted trialby-trial behaviors of animals than the alternative model with fixed uncertainty. This work demonstrates how representations of entire likelihood functions can be encoded neural populations, thereby providing support for the PPC framework and the modular view.
The second neural implementation framework that has the potential to support modular probabilistic representations is sampling codes [51, 67,68]. In this scheme, neural activity at a given point is assumed to be a sample of a probability distribution and neural variability across time reflects the uncertainty in the stimulus [69]. These predictions are consistent with the modular view in certain ways. For instance, neural responses in the ferret primary visual cortex confirmed model predictions of how spontaneous activity of neurons represented samples from a prior distribution, whereas responses that occured after sensory stimuli (observation) reflected samples from a posterior [21]. Another important feature of the sampling-based model is that neural variability is a direct consequence of probabilistic computation, not a nuisance feature in neural circuits, and therefore needs to be flexibly controlled depending on the stimulus uncertainty. Recent work [68] demonstrated that a computational model based on an excitatory-inhibitory motif could achieve such flexible control of neural variability, providing mechanistic accounts for several properties of cortical dynamics including stimulus-driven reduction of neural variability. This work provides a pathway to how biologically realistic circuits could perform Bayesian inference using sampling-based codes.
To summarize, modular implementations of Bayesian inference support independent representations of prior distributions and likelihood functions. Such implementations can sustain flexible computations with multiple representations and can support rapid sequential inference [70,71]. For several tasks, however, encoding full representations of priors and likelihoods can prove to be resource-intensive and inefficient [60,72]. Precluding the need for rapid trial-wise flexibility, one efficient solution would entail the implicit encoding of priors such that any observation may be nonlinearly transformed into a Bayesian estimate [10,60,73]. We explore this class of implementations in the next section.

Bayesian transform perspective
The transform view makes the assumption that neural mechanisms underlying Bayesian computations carry out an inputeoutput transformation that encodes relevant belief distributions from which a Bayesian estimate can be derived [10,60]. To illustrate this in light of the Bayesian transform view (Figure 3a), we examine a recent study where monkeys observed time intervals and then waited for a matching duration before responding [28]. When stimulus intervals across trials were sampled from different prior distributions, monkeys' waiting times were found to be biased toward the mean of the prior distribution (Figure 3b). Because timing responses become more variable with longer durations [82], Bayesian models predict greater uncertainty in likelihoods associated with long durations, and consequently, greater bias toward the mean [10]. This prediction was tested by using two prior distributions with short and long durations. During the task, neurons in the dorsomedial frontal cortex modulated differently during the range of each prior, which when viewed collectively in neural state space, manifested as semicircular population trajectories with different curvatures for each prior (Figure 3c and d). The study reports that when neural states were projected linearly onto the diameters of these semicircular trajectories, the resulting states became compressed toward the mean of the prior (Figure 3c). They predicted, the greater the compression in projected neural states, the larger the expected bias toward the mean in the behavior. These predictions were borne out in comparisons with Bayesian models and the monkeys' behavior ( Figure 3d). Recurrent neural network models, which recapitulated the animals' behavior and dorsomedial frontal cortex population dynamics, helped further test this hypothesis by enabling direct perturbation of the subspace occupied by the semicircular population trajectories, which when compressed elicited larger behavioral bias [28].
The modus operandi uncovered in these neural dynamics represents a deterministic transformation of an The Bayesian transform scheme. (a) An uncertain sensory measurement can be transformed into a Bayesian estimate by means of a nonlinear deterministic mapping f(.). (b) In a recent study [28], monkeys were trained to reproduce observed time intervals, which were sampled from two prior distributions, short (warm colors) and long (cool colors). Because response variability increases with duration, Bayesian models predict larger biases for longer priors, which were observed in behavioral responses. (c) One hypothesized mechanism by which such biased responses could be generated is the warping of neural population dynamics during interval measurements into semicircular geometries. Linear projections of these geometries onto their own diameters yield states that become compressed toward the mean of the prior. If these readouts are used to generate behavioral responses, these responses will also be biased toward the mean of the prior. uncertain time measurement into a Bayesian estimate, which represents the mean of the posterior. It, therefore, supports a mechanism that best reflects the transform class. Although this study did not directly test the assumption, the mechanism elucidated in this work shares many similarities with the theoretical framework called distributed distributional code (DDC), which does not require full probabilistic representations but uses statistical moments of the prior distribution [52,83]. Furthermore, DDC proposes a linear readout to estimate the mean of the posterior, which is consistent both with the population mechanism and type of Bayesian estimator described in the aforementioned study [28]. For these reasons, we propose that DDC is well-aligned with the transform perspective. Further work is needed, however, to develop a quantifiable approach to interpret different model frameworks and experimental findings under the modular or transform classes.
On the issue of flexibility underlying Bayesian transforms, recent experimental evidence elucidates how transform class mappings can be sensitive to changes in the relative reliability of likelihoods and priors [14]. In this study, monkeys were trained to make two time measurements before reproducing an interval, whereas time intervals across trials were sampled from a prior distribution. The task employed, therefore, jointly tests within-trial decrease of measurement uncertainty and across-trial influence of prior knowledge. In accordance with predictions of Bayesian estimation theory, neural states decoded from responses in the DMFC were biased during the first time measurement and decreased their bias during the second measurement, consistent with normative expectations of a corresponding decrease in uncertainty. This work demonstrates that parameters underlying such Bayesian transforms can flexibly adapt to changes in uncertainty. Other recent work has also explored neural correlates of flexible sequential Bayesian computations in rodents performing a spatial navigation task under sensory uncertainty [19].
The second vantage point through which the transform view can be considered is that of synaptic connectivity. Recent attempts have aimed to uncover mechanistic origins of Bayesian inference by studying how prior knowledge can be encoded in synaptic connectivity [13,49]. One of these studies provides the basis for learning and encoding of prior distributions over time intervals across the synaptic weights of the principal cerebellar neurons, Purkinje cells. The authors propose a computational mechanism by which cerebellar microcircuits transform incoming temporal measurements into states consistent with Bayesian estimates (Figure 3c) [85,86] may require technological advances in our ability to directly observe the influence of synaptic weights on a large scale and measure interareal inputs. At the population level, we may also hope for the means to dissociate latent population dynamics into their individual biophysical and cell-type interactions to bridge the two vantage points within the Bayesian transform perspective by leveraging recent advances in theory [87,88]. On the empirical side, it is interesting to note that while the toolkit for circuit interrogation to test these distinctions is currently available in rodent animal models, almost all the studies mentioned here were performed in nonhuman primates, where our ability to study circuits, cell types, and anatomy is significantly curtailed. Achieving these kinds of reductionist unifications will not only require us to look across species, it will also require theory and experiments to bridge insights across cellular, circuit, and population scales.
One point of synthesis arises naturally when considering the possibility that the brain may adopt both modular and transform strategies depending on task demand and computational power of the local task-relevant network. If optimal behavior in a task requires constant monitoring of uncertainty in variables [70,71], it makes sense to independently encode such variables in circuits [56], as the modular view purports. On the other hand, if stable learning of environmental statistics is sufficient to optimally perform the task, latent encoding of environmental statistics into the circuit may provide the most efficient solution, as captured by the transform view [53]. An important metric that allows us to gauge task demands and computational power is the expected dimensionality incurred by a task of given complexity in the neural state space [76]. For instance, a neural network that performs a Bayesian transformation for one task with low demands for flexibility will potentially occupy a lower-dimensional space than a network that performs modular Bayesian computations with independent representations of likelihood functions, prior, and posterior distributions. In silico investigations in recurrent neural networks may allow us to determine whether the same network can be trained to implement higher or lower dimensionality solutions that enable the switch between modular and transform instantiations along a spectrum. One important question that arises is how the brain might arbitrate between these two approaches across different brain regions given the discovery of high dimensional encoding of stimuli in sensory cortices [89,90] versus low dimensional encoding of tasks associated with frontal and motor cortices [77,78,91e99].
In conclusion, neuroscientists are increasingly embracing the need to rigorously quantify behaviors in pursuit of their underlying neural mechanisms [100].
Modeling behaviors with normative theories can further aid this effort by providing simultaneous constraints over both behavior and neural mechanisms, potentially increasing our ability to discover general principles of neural function. Future work in this domain will benefit from continuation of the recent tradition of interaction . Narain D, Remington ED, Zeeuw CID, Jazayeri M: A cerebellar mechanism for learning prior distributions of time intervals. Nat Commun 2018, 9:469. This work studied how the cerebellar circuitry encodes likelihoods for time intervals, which are transformed through a network of synaptic weights that encode prior knowledge. This model provides a mechanism for learning of prior knowledge in realistic circuits and of the generation of trial-by-trial Bayesian estimates for time intervals. Authors also showed that timing behaviors known to be encoded in the modeled cerebellar circuitry are consistent with Bayesian inference. 14 * * . Egger SW, Remington ED, Chang C-J, Jazayeri M: Internal models of sensorimotor integration regulate cortical dynamics. Nat Neurosci 2019, 22:1871-1882. This work studied a task that simultaneously gauged effects of withintrial decrease of sensory uncertainty and across-trial influence of prior knowledge, in a novel timing task in monkeys that exhibited Bayesconsistent behavior. Authors provide an account for flexible updating of prediction errors through nonlinear transformations.  122-129. Authors tested a prominent assumption of probabilistic population coding, whether full likelihood functions are represented in the brain. The work studied the primary visual cortex of monkeys as they performed a visual orientation discrimination task that required trial-by-trial use of stimulus uncertainty. They showed that the visual cortex data was most consistent with a full representation of the likelihood function.