Active inference as a theory of sentient behavior

This review paper offers an overview of the history and future of active inference — a unifying perspective on action and perception. Active inference is based upon the idea that sentient behavior depends upon our brains ’ implicit use of internal models to predict, infer, and direct action. Our focus is upon the conceptual roots and development of this theory of (basic) sentience and does not follow a rigid chronological narrative. We trace the evolution from Helmholtzian ideas on unconscious inference, through to a contemporary understanding of action and perception. In doing so, we touch upon related perspectives, the neural underpinnings of active inference, and the opportunities for future development. Key steps in this development include the formulation of predictive coding models and related theories of neuronal message passing, the use of sequential models for planning and policy optimization, and the importance of hierarchical (temporally) deep internal (i.e., generative or world) models. Active inference has been used to account for aspects of anatomy and neurophysiology, to offer theories of psychopathology in terms of aberrant precision control, and to unify extant psychological theories. We anticipate further development in all these areas and note the exciting early work applying active inference beyond neuroscience. This suggests a future not just in biology, but in robotics, machine learning, and artificial intelligence.


Introduction
Psychologists and neuroscientists are increasingly entertaining the idea of the brain as a "prediction machine", which learns an internal (i.e., generative) model of the lived worldand of the consequences of its actionsto make sense of sensations, predict how the current situation will unfold (i.e., learning and perception), and to act in a purposeful manner (i.e., action selection, exploration-exploitation, planning, et cetera).This idea appears in several guises, including the Bayesian brain, the predictive brain, predictive processing, predictive coding, active inference and the free energy principle, to name a few.
Here, we critically review the origins, scope and impact of this idea, in fields like psychology and neuroscience.For conceptual clarity, we focus specifically on active inference: a normative theory of sentient behavior that formalizes the "predictive brain" idea and provides a firstprinciple account of its computational and neuronal processes (Parr et al., 2022).
This breadth of application is appealing, but risks creating a fragmented picture and some uncertainty about its original commitments and conceptual implications.The aim of this brief manuscript is to help researchers using (or interested in) predictive coding and active inference to "connect the dots" and orient themselves within a growing literature.Despite distinct lines of workthat emphasize different aspects of active inferencethese applications all rest on the same core principles.To foreground these core principles, we will look at the historical and conceptual origins of active inference-to illustrate how its core principles were introduced; then consider briefly how the scope of active inference has expanded into several disciplines-and finally look to future developments.Given the brevity of this treatment, we cannot provide a full introduction to active inference.Rather, we provide an overview of the narrative in (Parr et al., 2022), which interested readers can consult.
In the next section, we briefly discuss the conceptual (and historical) roots of active inference in early views of prediction and action-based cognition.We then review some key developments of active inference, by focusing on landmark papers that explain how it stems from a single principle (namely, free energy minimization).We next consider its scope across perception, action, planning, etc.This brief review helps us make the point that active inference provides a unifying perspective on several cognitive topics and theories and across levels of understanding, from conceptual to neural.Finally, we briefly highlight some promising research directions that could expand the scope of active inferenceand potentially its impact on psychology and neuroscience.

The conceptual and historical roots of active inference
Active inference has roots in various early theories in cognitive science (and beyond, in fields that would not necessarily use the label "cognitive").One root is the idea that the brain carries a small-scale model of the environment and uses it to mentally simulate what-if actions, instead of (or before) acting on the environment (Craik, 1943).This idea is foundational in cognitive science.For example, (Tolman, 1948) proposed that humans, rodents and other animals find their way in a maze by first learning a mental model or "cognitive map", rather than by considering which of their navigation actions were previously rewarded the most, as assumed by behaviorist formulations.
Another root is the idea of (Helmholtz, 1866) that perception is an (unconscious) inference based on an internal generative modelthat uses recurrent (top-down and bottom-up) counter-streams of processing, rather than bottom-up transduction of external sensations into internal representations (and later actions).This idea was later developed in psychology (Gregory, 1968(Gregory, , 1980) ) and computational neuroscience; giving rise to the "Bayesian brain" hypothesis (Doya et al., 2007) and to formulations of predictive coding as a possible neurobiological implementation of perception-as-inference in the brain (Friston, 2005;Rao & Ballard, 1999).Beyond perception, other cognitive functions were later described in terms of inference, i.e., planning-as-inference (Botvinick & Toussaint, 2012).
Yet another "root" is the idea of cyberneticists (Miller et al., 1960;Powers, 1973;Wiener, 1948) that goal-directed action proceeds by firstly setting up a desired state or observation (e.g., feeling warm), then monitoring the discrepancynow referred to as a "prediction error"between the preferred and sensed state (e.g., feeling excessively warm), and then selecting a course of action that reduces this discrepancywhere "action" is a suitcase word and can include any means to exert control over external stimuli; ranging from simple autonomic reflexes (e. g. thermoregulation) to sophisticated plans (e.g., visiting one's favorite ice cream shop).A key result in this fieldwhich coheres with the Helmholtzian perspective aboveis the 'Good regulator theorem' of (Conant & Ashby, 1970), which argues that effective regulatory systems must [be a] model the environment they regulate.In a similar vein, in psychology, ideomotor theory proposed that action control is essentially anticipatory and that action are selected and controlled by their anticipated consequences or outcomes, not through stimulus-response (Hoffmann, 2003;Hommel, 2003;James, 1890).
Besides cybernetics, there are other influential views that highlight the centrality of adaptive regulation for behavior and life itself.One example is the idea that living organisms are autopoietic systems, which create the conditions for their own existence.More recently, this idea has been framed as 'self-evidencing' (Hohwy 2016) -i.e., creatures seek out sensations that provide evidence for their continued existence.Intuitively, sensing our body temperature to be around 37 • C offers more evidence that we are still alive than body temperatures far from this value.The concept of autopoiesis gave birth to enactive approaches in philosophy (Maturana & Varela, 1980).From another angle, it has been postulated that a central imperative for living organisms is maintenance of physiological homeostasis (i.e., correction of deviations from preferred physiological states through reflexive actions) and the regulation of basic imperatives (Cannon, 1929) but more modern theories emphasize that physiological regulation is fundamentally anticipatory (i.e., allostatic) (Sterling, 2012).Various researchers have proposed that closed-loop adaptive regulation (and not stimulus-response) is key to understanding not just physiology but (potentially) all cognitive processing (Cisek, 1999;Pezzulo & Cisek, 2016).
Finally, another root is the idea that cognitive processes, such as learning, perception and decision-making, require an active engagement of organisms with the environment.One early example of this actionoriented perspective is the view of Gibson that perceiving things consists in seeing what to do or not to do with them, i.e., perceiving affordances (Gibson, 1979).More recently, various researchers proposed the necessity of a "pragmatic turn" in cognitive science and neuroscienceand the need to recognize the importance of action as part and parcel of our cognition (Buzsaki, 2019;Cisek & Kalaska, 2010;Cisek & Pastor-Bernier, 2014;Engel et al., 2016;Lepora & Pezzulo, 2015;O'Regan & Noe, 2001), rather than just a way to report "central" decisions, as assumed in conventional (serial) theories.
Interestingly, each of these ideas implies a shift from reactive to predictive, enactive views of the brain.While a reactive brain waits for incoming stimuli, a predictive and active brain predicts external events (e.g., predictive coding) and actively gathers evidence (i.e., active sensing and active learning) to make sense of the world.While a reactive brain selects actions based on the past and present (e.g., the history of reinforcement and the current cue), a predictive brain actively imagines its preferred future and then makes this happen by acting (e.g., acts in a goal-directed manner).While a reactive brain maintains homeostasis, a predictive brain acts to anticipate needs and performs anticipatory regulatory (or allostatic) actions.
All these (and other) views contributed to raising the importance of predictive and enactive views of the brain and of cognition.However, each of these perspectives were somewhat disconnected from one another and linked to different research traditions, which are sometimes seen as conflicting with one another (e.g., the Helmholtzian and the Gibsonian traditions).One benefit of active inference is that it helps unify and thereby advance these traditions, as we explain in the following Sections.

The normative perspective of active inferenceand how it has developed
Active inference provides a normative perspective that unifies and advances the predictive and enactive views of brain and behavior.It does so by highlighting that several apparently disconnected accountsidentified by early theoriesstem parsimoniously from the assumption that living organisms obey a single imperative: namely, they act to minimize their surprise, 1 or more formally, their variational free energy.
The mathematics of variational free energy minimization is beyond the scope of this article; we suggest to the interested readers to consult (Parr et al., 2022).Here, instead, we introduce the key concepts of the theory, by briefly reviewing (non-chronologically) selected landmark papers and linking them to the early theories.
Active inference starts from a simple consideration: that to maintain 1 Technically, surprise here refers to self-information (a.k.a., surprisal); namely, the implausibility of some (sensory) outcome under a (generative) model of how that outcome was generated.
their existence and integrity, all living organisms need to remain in a bounded set of characteristic states that basically define their place within an ecological niche; for example, a fish cannot live out of water.
Using the lexicon of Bayesian inference, being out of water for a fish is a "surprising" state.Clearly a fish should avoid this surprise, and the idea generalizes to suggest that living organisms must avoid surprising states (Friston et al., 2010).If they did not, they would not be living organisms for long.Another way of looking at this is that everything (including me) is defined by being in some characteristic (attracting) set of states.
Conversely, I am defined by the kinds of states I cannot be in.These are surprising states.
A computationally tractable solution to surprise minimization is the minimization of an information-theoretic quantityvariational free energywhich is a function of two things: a generative model (i.e., a statistical model that describes how sensations are generated) and observed sensory data.This implies that a living organism must be equipped with a generative modelor in the lexicon of (Craik, 1943), a small-scale modelto predict the sensations generated by the world (and by the organism's place in it).In Bayesian terms, a generative model comprises two things: a prior over the hidden (i.e., unobserved) variables of interest and a likelihood function that maps the hidden variables to observables (Bishop, 2006).See Fig. 1 for a schematic illustration of the organism's generative model of the world and its relation with the generative process: the true environmental contingencies that generate its observations, which is inaccessible to the organism.
Put simply, an organism can minimize variational free energy by aligning the predictions of its generative model and the data it observes.In different settings, this minimization has been described in various ways, such as the minimization of surprise, of prediction errors, or of the discrepancy between the model and the world.All of these are equivalent to the minimization of variational free energy under specific sets of assumptions.
Interestingly, aligning the predictions derived from a generative model and data can be achieved in two ways: by changing the model predictions and by changing the observed data.The former corresponds to revising the agent's beliefs (used in the technical sense of probability distributions over hidden variables) if they do not explain the data well.This is exactly the inferential view of perception of (Helmholtz, 1866).The latter corresponds to acting in the world to change the data that will be sampled nextto render them more like the organism's prior predictions.This latter perspective on actionand on its dependence on expected outcomesis highly congruent with cybernetics (Miller et al., 1960;Powers, 1973;Wiener, 1948) and ideomotor theory (Hoffmann, 2003;Hommel, 2003;James, 1890).
In sum, changing beliefs about the causes of data (i.e., perception) and changing the data (i.e., action) are two aspects of free energy minimization.In formal terms, they map to its two components: the minimization of divergence and the maximization of evidence, see Fig. 2. Recognizing that action and perception can be unified within a single formal imperativethe minimization of free energyis one of the key innovations of active inference, which helps integrate and extend the early theories reviewed above.
Regarding neural implementation, one of the most widely entertained hypothesesabout how the brain might implement perceptual inferenceis predictive coding (Rao & Ballard, 1999).Fig. 3 shows the architecture of a predictive coding scheme as it might manifest in the cerebral cortex.In this predictive coding network, inference is realized by propagating predictions and prediction errors through top-down and bottom-up pathways, respectively, and by minimizing prediction errors across all levels.Interesting, predictive coding can be derived as a special case of variational free energy minimization (Friston, 2005).(Parr et al., 2022)-illustrates the structure of active inferential theories of brain function.Our worlds evolve according to some dynamical process that generates observations (y) from hidden states (x * ).Our internal models account for observations in terms of hypothetical hidden states (x).Our inferences about these states based upon our observations then drive actions (u) that intervene on the processes generating our sensations.(Parr et al., 2022)-highlights the relationship between action and perception via free energy (F).Perception involves minimizing free energy by changing our beliefs (Q) about states (x).This effectively minimizes the divergence (D KL ) between our beliefs and the probability of these states given sensory data (y).Action minimizes free energy through changing those parts of the free energy that depend upon sensory data-notably, the evidence or probability of data under our internal model.(Parr et al., 2022)-shows the message passing between populations of neurons under a predictive coding scheme as it might manifest in the layers of the cerebral cortex (separated into superficial layers I-III, layer IV, and deep layers V-VI).This shows predictions based upon expectations (μ) being subtracted from ascending signals to compute errors (ε), which are used to update expectations.The subscripts indicate whether we are dealing with fast changing dynamical variables (x) or more slowly changing contextual variables (v) which act to link together different hierarchical levels, with hierarchy indicated by the bracketed superscripts.As we ascend the hierarchy, the variables we deal with become slower, such that the contextual variables at one hierarchical level evolve over the same timescale as the dynamical variables at the level above.
While predictive coding is a model of perception, it can be readily extended to encompass the role of action in the minimization of free energy (described above).The move from predictive coding to active inference can be realized by equipping predictive coding networks with simple motor reflexes.In this perspective, the motor system works by generating proprioceptive predictions (in the same way standard predictive coding generates exteroceptive predictions) -and not motor commands, as conventionally proposedand these proprioceptive predictions are realized through the motor reflexes (Adams et al., 2013).
Subsequently, this theory was extended to also model autonomic control (Barrett & Simmons, 2015;Pezzulo, 2014;Seth et al., 2012).The general idea is that autonomic control might work by generating interoceptive predictions (i.e., homeostatic setpoints) and then fulfilling them through autonomic reflexes, in much the same way motor control might work by generating proprioceptive predictions and then fulfilling them through motor reflexes.This development of active inference helps connect it with theories of allostatic control (Sterling, 2012) and paves the way to a better understanding of our ability to model and control the internal milieu, not just the external environment.This stream of research underwrote novel approaches to psychopathologyas deficits of interoceptive processing (Paulus et al., 2019).
So far, we have discussed active inference using generative models that characterize processes that unfold in continuous time (e.g., predictive coding networks) and use continuous variables (i.e., the formal framework of dynamical systems and state-space models).However, many cognitive problems can be characterized at a distinct level: as (sequences of) discrete decisions.These include problems that require the selection of discrete responses during psychology experiments, the targets for saccades, or navigational trajectories in discretized environments (Friston et al., 2017;Friston, Lin, et al., 2017).These problems can be modeled in active inference, using generative models that use discrete variables (and the formal framework of Partially Observable Markov Decision Processes).
In addition to the two aforementioned components (priors and likelihood function), the generative models for active inference in discrete time often include a third component: the transition function, which describes the way in which hidden states change depending upon the agent's actions (or sequences of actions, called policies).Crucially, these generative models have temporal depth and afford a novel capability that was not available in simpler models: namely, planning.In simple terms, planning involves using the generative model to predict the consequences of different policies, scoring the policies according to how much they are expected to minimize free energy in the future and then (with some simplifications) select the best policy.
This planning process induces a novel quantityexpected free energy that is the functional that active inference uses to evaluate (and assign a prior to) policies and it is distinct from the notion of variational free energy discussed so far (Friston et al., 2017).The notion of expected free energy has been very useful in the development of active inference models of things like (bounded) decision-making, planning, exploration-exploitation and curiosity (Friston, Lin, et al., 2017;Parr & Pezzulo, 2021;Schwartenbeck et al., 2019).This is because this notion is richer than the common optimization objectives used in other formal frameworks (e.g., economic theory and reinforcement learning).This is because expected free energy considers jointly a pragmatic imperative (utility maximization) and an epistemic imperative (information gain, or the resolution of the uncertainty).Indeed, as Fig. 4 illustrates, it is possible to map expected free energy to various other formal notions (e. g., Bayesian surprise, Risk-sensitive control, Expected utility theory), by removing one or more of its terms.
Active inference is a general scheme that can be applied to address various cognitive processes.Crucially, the functioning of active inference is the same across all problems: what differs is the generative model, which is task specific.This implies that by designing the appropriate generative models, it is possible to address a variety of cognitive tasks with the same approachand to pass from the normative perspective of active inference to specific implementations that have biological plausibility (Friston, Parr, et al., 2017;Parr & Friston, 2018).
Here, a worked example may be helpful.To illustrate some of the principles we have outlined so far, we will consider how we might go about developing a model for a ubiquitous task in cognitive neuroscience-a delay period oculomotor task.This is a relatively simple task that can be performed by humans-and some animals-and that is designed to probe working memory function (Funahashi et al., 1989).The task sequence is as follows.First, a cross is presented on screen and our subject maintains fixation on this cross.A target then appears at one of several possible locations towards the periphery of the screen, but our subject still maintains fixation.The target then disappears and, after a 'delay period', a stimulus appears to signify that the subject should make a saccadic eye movement to the location of the target.Successful performance of this task relies upon retaining a memory of the target location during the delay and response phases.
To model this task, we must consider the data available to the subject.In this case, these are the visual stimuli and proprioceptive inputs, and whether the correct action was chosen.To do so, we need to take account of the causes of these data.The causes of proprioceptive data are simply the direction in which our subject's eyes are pointing.Visual outcomes, depend upon a combination of (1) gaze direction, (2) the intended target location, and (3) the current stage of the task (i.e., the fixation, target presentation, delay, or response stage).For each of these three variables, we must then specify how we expect them to evolve throughout the task.The gaze direction will transition from one step to the next based upon the decisions our subject makes.The intended target location will be fixed (although initially unknown) throughout the task.The task stage evolves predictably through a sequence of steps.Together, these beliefs about the way in which data are generated and the dynamics of the causes allow our subject to predict what will be observed next, and to update these beliefs when these predictions are violated.
As outlined above, active inference equips models with prior beliefs about the relative plausibility of different choices based upon their relative expected free energies.In this model, the key part of the expected free energy is a preference for receiving the 'correct' feedback outcome which is only available during the response phase of the task (see (Mirza et al., 2016) for a similar setup in the context of scene categorisation, in which the main role of the expected free energy is to promote information seeking).It is this that prompts a saccade to the for preferences.Note that some terms (including term 1) are expressed in terms of expectations-i.e., averages under the subscripted distribution.remembered target location.Finally, the predicted action must be executed.This depends upon resolving the error between the anticipated proprioceptive information given the inferred saccade and current proprioceptive input.The result is the sequence of steps shown in Fig. 5.
The oculomotor control example illustrates how active inference can be concretely applied to study cognitive tasks, by designing (or learning) the appropriate generative models.Generative models represent formal hypotheses about how cognitive tasks are accomplishedhypotheses that can be validated with empirical data.A useful illustration of the design principles to realize (or train) generative models for different cognitive problems is provided in (Parr et al., 2022).This treatment makes a distinction between generative models in continuous time (that are useful to address motor control tasks) and discrete time (that are useful to address decision and planning tasks) and explains how these two types can be combined to form so-called hybrid or mixed generative models, in which discrete-time models are placed on top of continuous-time models.Furthermore, the generative models of active inference can be extended hierarchically, to model processes that unfold at different timescales.One example is the model of active listening processes, in which (for example) lower hierarchical levels deal with words and higher levels deal with sentences (Friston et al., 2021).Another example is a model of hierarchical action recognition that recognizes actions at different levels, from low level kinematics to higher level goals and intentions (Proietti et al., 2023).It is also possible to use hierarchical models to model hierarchies of control, in which lower-to-higher levels deal with autonomic imperatives (e.g., ensure a correct basic temperature) in increasingly complex ways (e.g., from thermoregulation to the goal-directed plan to buy water before a long run) (Pezzulo et al., 2015;Tschantz et al., 2021).These developmentsfrom simple to more sophisticated (e.g., hierarchically and temporally deep) generative models has extended the range of cognitive models that have been addressed using active inference over the years.
Another interesting realization is the fact that it is possible to derive a biologically motivated "process theory" for active inference in discrete time, by interpreting the specific operations (variational updates) required to minimize free energy as signals that are computed or exchanged across neurons (Friston et al., 2017).This is important because it permits crossing levels of explanationfrom normative to mechanistic and neuronaland to use active inference to simulate neuronal activity that would ensue from the performance of cognitive tasks (Friston, Parr, et al., 2017;Parr & Friston, 2018).
Another important development of active inference regards precision control and its role in psychopathologies.In predictive coding, variables are encoded as Gaussian distributions and precision simply refers to the inverse of the variance of a distribution (Friston, 2005).Precision control refers to a mechanism that optimizes the precision of (the distribution of) each variable of an organism's generative model.It is important since it regulates the relative importance of top-down predictions and bottom-up prediction errors across the hierarchy.This is because prediction errors that are assigned greater (lower) precision have greater (lower) impact on the belief updating and the ensuing inference.Veridical inference requires the precision of (the distribution of) each variable to be optimized, to reflect the signal-to-noise ratio of sensory signalstherefore highlighting a link between precision control and attention as gain control (Parr & Friston, 2019a) -or the importance of an organism's prior preferencesreflecting the fact that an organism's innate drives or goal states can be encoded as highly precise priors (Pezzulo et al., 2015).Interestingly, when precision control fails, it can produce excessively rigid forms of inference (when priors fail to be updated in the light of novel evidence) or excessive sensitivity to stimuli (when belief revision follows the sensory input and its random fluctuations too closely-i.e., it overfits).These forms of aberrant inference, which depend sensitively on predicted precision, have been adopted to explain several psychopathological conditions, such as delusions, depression, psychosis, and many others (Barrett et al., 2016;Corlett & Fletcher, 2015;Edwards et al., 2012).In turn, these theories also speak to aberrant neuromodulation, since the precision of (the distribution of) different variables might be encoded by different neuromodulators, e.g., acetylcholine for the precision of the likelihood, noradrenaline for the precision of transitions, dopamine for the precision of policies, etc. (Parr & Friston, 2018).
Yet another development regards the analysis of generative models during sleep or other 'offline' periods.It has long been hypothesized that learning generative models benefits from alternating on-line and off-line periods (Hinton et al., 1995).While on-line generative modelling maximises accuracy (under complexity constraints), during off-line activityin the absence of sensory data to "explain away" -model optimisation can focus on minimising complexity; for example, by removing redundant parameters (Friston, Lin, et al., 2017;Pezzulo et al., 2021).From a neuronal perspective, generative modelling during offline periods could be associated with (generative) replay activity in the hippocampus, the prefrontal cortex and other brain areas; but these links remain to be fully established (Foster, 2017;Schwartenbeck et al., 2023;Stoianov et al., 2022).Finally, an interesting development regards the realization of active inference in which the free energy minimization extends "beyond the skull", to model the ways multiple active inference agents engage in cooperative or competitive tasks (Friston & Frith, 2015;Maisto et al., 2023) or construct their own niches (Constant et al., 2022).These and other works illustrate that the concept of free energy minimization can readily extend to multi-agent settingsincluding settings that go beyond the standard scope of cognitive science, such as morphogenesis (Friston & Levin, Sengupta, et al., 2015) and autopoiesis (Friston, 2013) and hence potentially shed light on the relations between multiple nested levels of (self-)organization, from individual to social and cultural levels.
In sum, we have highlighted various developments of active inference, which encompass the complementary roles of perception and action in minimizing an organism's variational free energy (and ensuring that it successfully avoids "surprising" and characteristic states), the proposal of biologically plausible architectures for continuous time predictive coding and action control, the realization of generative models for discrete decisions that afford planning and the minimization of expected free energy, the hierarchical extension of these models, the importance of precision control, and beyond.For each of these topics, we have cited some selected papers that the interested readers might want to consult for more detailed information.Clearly, this is not an exhaustive list, but each of these developments has been useful to develop models of increasingly complex cognitive and social functions; see (Parr et al., 2022) for a more exhaustive treatment of active inference.

The benefits of unification
In the previous Section, we saw that the scope of active inference touches several domains of psychology and neuroscience.Here, we foreground a benefit of this rapid expansion: namely, unification.
Arguably, a main goal of cognitive psychology and neuroscience is explaining behavior and its neural foundations, in a comprehensive (if not a unified) way.Yet, to ensure methodological rigor, these disciplines usually adopt restricted laboratory settings that tend to isolate cognitive functions and obfuscate their relations (Maselli et al., 2023).Consider for example a mundane task that we solve almost every day: crossing a busy road.Even this relatively simple task engages several cognitive processes in a coordinated manner, such as perception (of the situation), memory (of past street crossing episodes), planning and action selection (of the best route), motivation (and the "why" of crossing), attention (to select the most relevant stimuli), etc.These processes are often studied in isolation using different paradigms leading to a proliferation of hypothesis and theories that assign each of them a distinct computational objective (and perhaps brain area) -therefore determining a very fragmented theoretical landscape.
Active inference proceeds the other way around: it starts from a single principle and asks how far one can go with it.And to what extent it is possible to derive from that principle empirically testable hypotheses about behavior and its cognitive and neural mechanisms?This approach brings the benefits of unification, in at least six ways.
First, active inference assumes that everything, from perception to action selection and learning ultimately serves to minimize variational free energy.A consequence of this is that one can align the (sometimes vague) conceptual terms used in psychology with crisp formal terms of free energy minimization.For example, one can assign things like attention to precision control.At the neuronal level, the fast updatesmediated by synaptic activitymight correspond to inferential processes that minimize free energy at a fast time scale, whereas the slower updatesat the level of synaptic efficacymight correspond to learning processes that minimize free energy at a slower timescale.Precision dynamics might correspond to the activity of neuromodulators, which finesse the inference at multiple levels, for example, by increasing the post-synaptic gain of sensory or prediction error-units (Feldman & Friston, 2010).Oscillatory dynamics that are ubiquitous (and that often occur in synchrony) both within and across brain area might be signatures of temporal prediction and of the exchange of top-down and bottom-up information across hierarchical levels of the brain's generative model (Arnal & Giraud, 2012).
Second, active inference suggests that cognitive functionsusually addressed in isolationmight be instead better understood by appealing to a unique process theory.For example, in prominent computational neuroscience theories, perception and action are two separate functions with different objectives and neural substrates.According to Bayesian decision theory (Robert, 2007), the goal of perception is to provide an accurate estimate of the agent's state, whereas the goal of action selection is to maximize its expected utility.The former process is a precondition for the latter, implying an outdated, serial view of cognitive processing.Active inference holds that perception and action cooperate to minimize free energy, by minimizing divergence and maximizing evidence, respectively (Parr et al., 2022).Another example is the fact that in 20th-century cognitive science, working memory was considered as a separate storage that can be assessed by other components when needed; therefore, imposing a separation between information storage and information processing.In contrast, active inference models of hierarchical perception and action (Friston et al., 2021;Pezzulo et al., 2018) treat memory of the previous state as intrinsic to the belief updating under generative or world models, across multiple timescales, which is in keeping with 21st-century accounts of working memory (Hasson et al., 2015).
Third, active inference has the potential to unify different "levels of understanding" of cognitive processes.Marr famously introduced a distinction between computational, algorithmic and neural implementation levels and argued that progress can be made within each level and by connecting different levels (Marr, 1982).Establishing links between theories that operate at different levels is often challenging.Active inference helps establish firm relations across levels of description.Rather than Marr's tripartite distinction, in active inference it is more common to appeal to a distinction between normative theory and process theory (Friston et al., 2017).Free energy minimization is the normative objective of living organisms, whereas predictive coding and variational message passing are process-level theories that describe how the brain might support free energy minimization.Importantly, as shown by (Friston, 2005), under certain assumptions predictive coding can be directly derived by the minimization of variational free energy, connecting the two levels of explanation.A similar case can be made for the variational message passing schemes proposed to support discrete active inference in neural circuits (Friston et al., 2017).
Fourth, unification endows existing constructs with validity, via the application of active inference across domains.One example is the development of theories of interoceptive inference and autonomic control (Barrett & Simmons, 2015;Pezzulo, 2014;Seth et al., 2012) by analogy with the functioning of action control (Adams et al., 2013).In this perspective, autonomic control works exactly like action controlnamely, it aims to minimize a discrepancy between a predicted and a sensed signalexcept that the "signal" refers to interoceptive streams rather than proprioceptive streams.Another example can be found in computational psychiatry, where numerous accounts of psychopathology appeal to a single mechanism: namely, aberrant precision control.
Fifth, active inference has the potential to reconcile (or at least to contextualize) theoretical perspectives that have long been considered at odds in psychology, neuroscience and philosophy.One example is the Helmholtzian view that perception constitutes an inference about the entities of the external world that cause our sensations (Helmholtz, 1866) and the Gibsonian view that perceiving consists in seeing action opportunities and affordances, not reconstructing a model of the external reality within the brain (Gibson, 1979).This apparent dialectic could be dissolved by considering that there are multiple ways to design generative models; specifically, a relevant distinction is between generative models that explicitly model the ways external states produce sensations (a.k.a., environmental models) or the ways actions produce sensations (a.k.a., sensorimotor models) (Sims & Pezzulo, 2021;Pezzulo et al., 2023).Some active inference studies use generative models that include explicit beliefs about entities in the external world that cause sensations, such as one's location in space (Friston et al., 2017).Other active inference studies use generative models that only consider the sensory consequences of one's action, such as touch sensations that follow whisking at a given amplitude, but not explicit beliefs about objects 'out there' (Mannella et al., 2021).The latter generative models adhere more closely to the notions of affordance (Gibson, 1979) and of sensorimotor contingency (O'Regan & Noe, 2001), despite the fact they still entail inferential dynamics.Besides this specific topic, there is a vivid debate in philosophy that concerns the most appropriate way to consider active inference, in relation to internalist (Hohwy, 2013), externalist (Clark, 2013) or enactivist theories (Bruineberg et al., 2018).
Finally, and importantly, the integrative perspective of active inference could be valuable in characterising of sentient behaviourconsidered here to be the capacity to infer states of the world and to act upon it with a sense of purpose (Friston, Da Costa, et al., 2023).This operational definition is satisfied by active inference when, and only when the generative model includes the consequences of action (mathematically, when the generative model includes priors over policies based upon expected free energy).This notion of sentience is does not have any phenomenological commitments and is probably best read as 'basic sentience' in the sense of (Clark, 2023).
Recently, there has been a proliferation of advanced Generative AI systems that process language, images and videos with very high accuracy.However, in most cases, these systems learn passively from large predefined datasets and disregard agencyand the possibility to act upon the world with a purposeto develop genuine understanding (Pezzulo et al., 2023).Active inference suggests a different path to understand and simulate sentient behaviour, which focuses on the development of grounded world (i.e., generative) models, by actively engaging with the environment and by predicting the consequences of the requisite interactions.An open question for future research is whether the enactive and embodied approach of active inference has the potential to complement and advance the development and deployment of Generative AI.

Opportunities for the future
It's Difficult To Make Predictions, Especially About the Future.Niels Bohr.
The compass of active inference is expanding rapidly, but the landscape of future opportunities may be even ampler.Here, we focus on some of the developments that we consider most promising and most likely in the near future.
The first and perhaps most obvious direction for the future regards a deeper empirical scrutiny of active inference.A question that is sometimes asked of active inference is whether any empirical findings could offer evidence for or against the framework.This can be a vexed question to answer as it constitutes a category error.A framework is not in itself a hypothesis.It is a way of formulating hypotheses.The relationship between active inference and empirical psychology is that we can formalize psychological theories in terms of the generative models that underwrite neurophysiological and behavioural responses.Equipped with a proposed model, the framework can be used to express a hypothesis, to predict the behaviour expected under that hypothesis, and to fit to measured data to formally compare alternative hypotheses.In other words, while active inference is an application of the free energy principlewhich is a principle (i.e., method) rather than a theory (Friston, 2010) theories tested under the active inference framework (e.g., those considered in this article) make specific empirical predictions that can (and need to) be empirically validated.One example of this is the oculomotor delay period model shown in Fig. 5, which generate empirically testable predictions about oculomotor performance as a function of varying delay periods (Parr & Friston, 2019b).Various empirical studies are already addressing the empirical Box 1 Glossary of technical terms.
Active Inference: A normative framework that elucidates the neural and cognitive processes underlying sentient behavior, beginning with first principles.This framework posits that perception and action work in concert to minimize a shared functional known as variational free energy.
Expected Free Energy: This is the quantity that is used in active inference to score action sequences or policies (and then to select between them).It takes into consideration both the pragmatic value of policiesor how close a policy's expected outcomes are to the preferred outcomes and their epistemic value (or information gain) -or how much the policy is expected to reduce uncertainty.
Generative Model: A statistical model designed to explain the generation of observable content from unobservable, hidden (latent) causes.For instance, it clarifies the process by which a visual object gives rise to an image on the retina.Generative models serve a dual purpose: they allow the generation of novel, synthetic content and support the inference of hidden causes from observable data.From a technical standpoint, generative models encode the joint probability distribution governing both observables and hidden causes.
Latent (or Hidden) Variable: An internal variable within a generative model, referred to as "latent" or "hidden" due to the fact that it cannot be directly observed, but must be inferred.
Precision and precision-weighting: Precision denotes the inverse of variance or standard error, serving as a measure of the reliability or certainty associated with sensory information.Precision-weighting refers to the fact that in predictive coding and active inference, prediction errors are weighted by their respective precisions, therefore determining the extent to which sensory observations influence the process of updating beliefs.
Predictive Coding: A computational framework in neuroscience that provides a possible neural implementation for the idea that perception consists in a process of inference.In hierarchical predictive coding networks, inference is realized by minimizing (precision-weighted) prediction errors across all hierarchical levels.In turn, this requires bidirectional loops between top-down processes (conveying predictions) and bottom-up processes (conveying prediction errors).
Variational Free Energy: This is the functional (function of a function) that is minimized within the framework of active inference.It is also widely utilized in utilized in probabilistic modeling, statistical inference and machine learning.In its simplest instantiation, it corresponds to a summation of prediction errors, which quantifies the deviation of observed data from the predictions of the generative model.More formally, variational free energy serves as an upper bound on the negative logarithm of the evidence, which is the probability of observed data given a model.
predictions of predictive coding, such as how top-down and bottom-up dynamics support predictions and prediction errors, respectively (Walsh et al., 2020).However, active inference makes a number of specific predictions about (for example) the way the motor system works (Shipp et al., 2013) and the way higher cognitive functions are implemented (Pezzulo et al., 2018) that differ from mainstream theories and could be increasingly scrutinized by future studies.
A second interesting direction for the future is assessing to what extent active inferenceand more broadly, the free energy principlecan help us understand the evolution of complex neural circuits and life forms from simpler ones.Active inference suggests a possible path from the simple mechanisms that supported prediction and control in our earlier evolutionary ancestors to the more sophisticated abilities of our species (Pezzulo et al., 2022), but a comprehensive account of the evolution and "phylogenetic refinement" (Cisek, 2019) of living organisms remains to be fully developed (Friston et al., 2023;Friston, Friedman, et al., 2023).
A third interesting direction for the future regards the realization of advanced artefacts, such as AIs and robots, based on active inference.There have already been several successful robotic implementations of active inference, but the full potential of the theory has not yet been reached (Ahmadi & Tani, 2019;Lanillos et al., 2021;Priorelli et al., 2023;Taniguchi et al., 2023).Interestingly, some of the central concepts of active inference, such as the importance of generative models and self-supervised, predictive learning, are becoming central in mainstream research in AI, as testified by the recent successes in generative AIs such as large language models.This creates an important opportunity, since (apart for their obvious technological impact), state-of-the-art AI systems can be precious in advancing our understanding of living organisms, providing that they incorporate appropriate (design) principles (Pezzulo et al., 2023).

Fig. 1 .
Fig. 1.Generative model and generative process in active inference.This Figure-reproduced from(Parr et al., 2022)-illustrates the structure of active inferential theories of brain function.Our worlds evolve according to some dynamical process that generates observations (y) from hidden states (x * ).Our internal models account for observations in terms of hypothetical hidden states (x).Our inferences about these states based upon our observations then drive actions (u) that intervene on the processes generating our sensations.

Fig. 2 .
Fig. 2. Perception and action play complementary roles in the minimization of variational free energy.This Figure-reproduced from(Parr et al., 2022)-highlights the relationship between action and perception via free energy (F).Perception involves minimizing free energy by changing our beliefs (Q) about states (x).This effectively minimizes the divergence (D KL ) between our beliefs and the probability of these states given sensory data (y).Action minimizes free energy through changing those parts of the free energy that depend upon sensory data-notably, the evidence or probability of data under our internal model.

Fig. 3 .
Fig. 3.The architecture of predictive coding.This Figure-reproduced from(Parr et al., 2022)-shows the message passing between populations of neurons under a predictive coding scheme as it might manifest in the layers of the cerebral cortex (separated into superficial layers I-III, layer IV, and deep layers V-VI).This shows predictions based upon expectations (μ) being subtracted from ascending signals to compute errors (ε), which are used to update expectations.The subscripts indicate whether we are dealing with fast changing dynamical variables (x) or more slowly changing contextual variables (v) which act to link together different hierarchical levels, with hierarchy indicated by the bracketed superscripts.As we ascend the hierarchy, the variables we deal with become slower, such that the contextual variables at one hierarchical level evolve over the same timescale as the dynamical variables at the level above.

Fig. 4 .
Fig. 4. Expected free energy and the way it can be mapped to different formal notions (e.g., Bayesian surprise, Risk-sensitive control, Expected utility theory) by removing one or more terms, denoted with numbers.This Figure-reproduced from (Parr et al., 2022)-expresses expected free energy in terms of beliefs about trajectories (indicated by the tilde ~).The additional symbols here, not in previous figures, are the π for policies and the C

Fig. 5 .
Fig. 5.A simulated oculomotor delay period task.This figure, taken from (Parr & Friston, 2019b) (published under a CC BY 4.0 license), shows simulated performance of a simple working memory task under active inference.Although simple, this task calls for planning (of our next saccade), recall (of the target location), and movement execution.The upper left images show a series of frames taken from the simulation, as if we were observing our participant's eyes.The black arrows link these behavioural responses to the view of the stimulus screen from the time of the target (red) presentation to the response phase.A series of black dots show the (cumulative) trajectory of gaze direction.Because this model is formulated to have both continuous (prediction-error minimising) and discrete (sequential planning) parts, we can plot the trajectory both in terms of position and velocity (lower left) and in terms of the sequence of actions taken.