A new multi-level modeling framework provides evidence for the simulation of object dynamics in the dorsomedial frontal cortex.

Cognitive science explores structured representations of the world that together support the richness of perception and flexibility of cognition. Existing frameworks for exploring representations in the mind are either not straightforwardly relatable to neurobiology, or they only allow black-box hypotheses based on architectural choices, loss terms, and training on large datasets. Here, we present a new multi-level modeling framework for testing how world models are implemented in neurobiology. Building on a recent method that programs the weights and connectivity of recurrent neural networks (RNNs), instead of training them, we enable direct realizations of candidate structured world models in neurally mappable models. We illustrate this approach in a fundamental aspect of cognition: mental simulation. We program the video game “pong” in an RNN that readily supports model-based game-play via a simple linear decoder. We show that this programmed RNN quantitatively predicts population-level activity in monkeys’ frontal cortex who are playing pong, and critically, unlike traditionally trained RNNs, reproduces a key non-linearity observed in neural populations. This work establishes a novel multi-level computational framework that promises to reverse-engineer the neural building blocks of mental life.


Introduction
A central challenge for cognitive science and neuroscience is to understand how in the brain the physical world is represented -so as to support flexible planning and behavior.We refer to these richly structured projections of reality as "world models" and they include representations of objects with physical properties, scenes with navigable surfaces, and events with temporally demarcated dynamics (Fig. 1A).How are world models, and in particular, objects and how they react to forces, implemented in neural populations and dynamics?
Probabilistic models of cognition (Chater, Tenenbaum, & Yuille, 2006) and task-optimized deep neural networks (DNNs; Yamins and DiCarlo (2016)) are existing frameworks that aim to reverse-engineer such representations.Probabilistic models in cognitive science explore the structure of mental representations (e.g., as mathematical descriptions of generative models and inference).Typical toolkit to create these models involves high-level software tools (e.g., probabilistic programming packages, off-the-shelf computer graphics systems; Fig. 1B), constraining them to be most readily testable on behavioral data, often without a straightforward mapping to neural implementation.Task-optimized DNNs have become increasingly useful in computational neuroscience, but they typically only allow for explorations of largely black-box representations based on architectural choices, loss terms, and training on large datasets.Thus, there are no general purpose frameworks that cut across levels of analysis to allow exploration of structured world models and are testable in neurobiology.Here, we present a new multi-level modeling frameworkprogrammable RNNs -that renders hypotheses about structured world model directly testable in neural data.In this framework, we specify representational hypotheses as symbolic functions of variables, and automatically program the weights and connectivity of biologically plausible RNNs to compute these specifications (Fig. 1C).(This is in contrast to the standard training of neural networks on a dataset of function evaluations or input-output pairs.)We accomplish this direct compilation of symbolic to neural by building on a recent method in physics of dynamical systems (Kim & Bassett, 2023), based on fundamental ideas in linearization and control of nonlinear systems (Strogatz, 2018).
We illustrate this approach to reverse-engineer a core aspect of cognition: the ability to mentally simulate how objects move and react to forces.A recent study explored this ability in macaque monkeys by having them play the video game of "pong" while recording neural activity in the dorsomedial frontal cortex (Rajalingham, Sohn, & Jazayeri, 2022b;Rajalingham, Piccato, & Jazayeri, 2022a).Previous modeling work in cognitive science explores runnable mental models of how objects move and react to forces -akin to physics engines in computer graphics that approximate Newtonian mechanics for efficient simulation -as a plausible internal model for guiding planning and action in such scenarios (Battaglia, Hamrick, & Tenenbaum, 2013).To test whether the neural dynamics in the frontal cortex populations compute approximate object simulations, we program a neural "pong engine" in an RNN and drive it for model-based game-play.We show that the internal states of this RNN not only quantitatively predicts the similarity structure of the neural dynamics, but also, crucially, explains a key non-linearity observed in these populations that cannot be explained by traditionally trained RNNs.

A new multi-level modeling framework
To enable a multi-level inquiry of world models in the brain, we build on a recent method (Kim & Bassett, 2023) that programs the weights and connectivity of RNNs to compute symbolically specified functions.This is accomplished via appropriate linearization (i.e., without sacrificing accuracy) of the RNN hidden state as a symbolic function of its inputs.A schematic of the RNN and its equations are shown in Fig. 2A, including its input x x x ∈ R k , hidden state r r r ∈ R N , and output o o o ∈ R m with the weight matrices B, A, and W connecting pairs of these quantities.To make the RNN programmable, we decompose its hidden state r as a polynomial in inputs via linearization around an operating point (Fig. 2B; see Kim and Bassett (2023)).Given a symbolic output specification o (Fig. 2C), which we illustrate for a linear dynamical system, we can readily program a read-out matrix W by solving a simple linear equation (Fig. 2A, bottom).Crucially, this programmed read-out matrix can be embedded as the connectivity matrix of the RNN by creating a feedback loop (Fig. 2D), enabling functional compositionality.For the linear system, when we connect the position back in (and keep velocity as input), the feedback recurrence generates trajectories (Fig. 2E).Programming a neural pong engine We hypothesize that while playing pong, monkeys deploy an internal model of the game dynamics (e.g., ball movement, wall collisions), and use it to guide paddle control.To test this, we model the setting in Rajalingham et al. (2022b) (Fig. 1A) using our programmable RNNs.We first describe the state-space rules of the game of pong (Fig. 3A) as an output matrix, and program a feedback RNN to simulate this state-space.The key challenge in doing so is to have a circuit logic for detecting (and resolving) collisions.Following Rajalingham et al. (2022b), we address this challenge by programming a neural set-reset latch relative to walls and the paddle that flips the ball velocity when the position of the ball exceeds the position of a wall or paddle (Fig. 3B).Via feedback recurrence, this neural collision detection mechanism and the constant ball speed together update the ball velocity and position.All state variables, except the paddle position, are folded into the feedback recurrence.

Model-based game-play via linear decoding of RNN state
To play the game -i.e., determine paddle position at each time step to intercept the ball -we linearly regress where the ball will be in the next time step (or the next K-time step) based on the state of the programmed RNN.We input this predicted position as the paddle position into the RNN (Fig. 3C).Rajalingham et al. (2022b) created 79 pong conditions -each with a unique pair of initial ball position and velocity -and had monkeys control the paddle to intercept the ball.They recorded in the dorsomedial frontal cortex of two monkeys using neuropixels, while the monkeys played each condition multiple times.Their study had a manipulation in which the ball became invisible approximately halfway through each trial.Here, however, our focus is on the visible epochs of the conditions (not addressing uncertainty), thus we consider recordings only from the visible period of game-play (0ms to roughly 1,200ms).We model the average spiking rate of more than 2000 isolated units per condition, sampled at 50ms independent time windows.Programmed RNNs driven by model-based game-play explain neural dynamics Following Rajalingham et al. (2022b), we compare the similarity structure (across time and conditions) of the RNN dynamics in our models to that of the neural data.We find that the programmed RNNs explain substantial variance in the data (results showing 100 programmed networks), and crucially, does so better than a random policy and no-control match RNNs (Fig. 3D).Variance explained by traditionally trained RNNs, as reported by Rajalingham et al., is comparable to these control models and less than our full model.These fits by the model are robust to how far into the future the controller is predicting paddle position (Fig. 3E).We also find that the better a programmed RNN is at game-play, the better it is explaining neural similarity structure (Fig. 3F).Non-linear decoding of final intercept point in programmed RNNs and neural populations In the frontal cortex, the final intercept point becomes decodable non-linearly and early on in a trial, while existing traditionally trained RNNs do not show such non-linearity (Rajalingham et al., 2022b).

Monkey pong task and neural data
Here in contrast, we find the final state of the ball becomes decodeable in the model in a non-linear fashion (Fig. 3G), but, it does not occur as early as it does in the neural data.

Conclusions
Our results suggest that the dorsomedial frontal cortex populations implement approximate simulation of object dynamics to enable flexible behavior.These findings demonstrate the potential of using this new multi-level modeling framework to test the implementation of world models in the brain.

Figure 1 :
Figure 1: (A) The brain transforms sensory inputs into rich object representations that support flexible behavior.(B) Cognitive models formalize world models, but are difficult to map to brain.(C) We present a new multi-level framework that enables explorations of structured world models in neurobiology.

Figure 2 :
Figure 2: Programming RNNs on symbolically specified functions, instead of training them on exemplars sampled from that function as is predominantly done.See text for details.

Figure 3 :
Figure 3: Programmed RNNs driven by a simple model-based policy explains neural dynamics in frontal cortex.See text.