Learning Cognitive State Representations from Neuronal and Behavioural Data

Explaining how neuronal activity gives rise to behaviour and cognition is a central goal of cognitive neuroscience. With the proliferation of larger neuronal datasets, there have been various attempts to abstract representations of the neuronal data. Some methods consider behavioural decoding to be important while other unsupervised meth-ods like PCA and autoencoder disregard behaviour alto-gether. Here, we propose an architecture to learn cognitive state representations which preserve information of both the dynamics and behaviour. We present a neural network implementation (BunDLe Net) and apply it on calcium imaging neuronal data of the roundworm C. elegans . Our method reveals clear orbit-like trajectories which are recurrent and structured. It also outperforms conventional methods in the field such as PCA, autoen-coders and autoregressors with regards to the dynamical predictability and behavioural decoding accuracy.


Introduction
The rapid development of neuroimaging techniques has resulted in neuronal datasets of ever-increasing detail and complexity.More data, however, does not necessarily translate to a better understanding of brains and neuronal systems.This is because larger datasets, and models of these datasets, are often much harder to interpret (Hoel, Albantakis, Marshall, & Tononi, 2016) even with state-of-the-art tools in computational neuroscience (Jonas & Kording, 2017).One way to deal with this is to create a high-level representation of the neuronal activity (Marr, 1982;Sch ölkopf et al., 2021).
Commonly-used methods for learning state representations in neuroscience include dimensionality reduction techniques, the majority of which are unsupervised (Kato et al., 2015;Gao et al., 2017).Such approaches have been criticised since they attempt to model the brain in isolation, without recourse to the behaviour it implements (Jonas & Kording, 2017;Krakauer, Ghazanfar, Gomez-Marin, MacIver, & Poeppel, 2017).The resulting state representations are of limited practical use since they are difficult to relate to behaviour, let alone model it.
At the other end of the modelling spectrum are psychologists who create and employ cognitive models to reason about a subject's behaviour.These models are generally arrived at by empirical behavioural studies.This approach is very useful and one often makes causal statements about behaviour and cognitive states, for example: The boy started crying [behaviour] because he was afraid [cognitive state of fear].Such causal statements would become more concrete (and possibly testable) if the cognitive states were to have a grounding in neuronal activity.
In this work we propose a framework to learn cognitive state representations directly from neuronal activity with respect to a behaviour of interest.We first introduce our motivating theoretical principles and a working definition of a cognitive state.
Based on this, we propose a generic architecture for learning neuronal state representations from time-series data.We then implement and evaluate our algorithm on neuronal and behavioural data from the nematode C. elegans.

Motivating theoretical principles
For the scope of this paper, we propose the following working definition of a cognitive state.
Cognitive state: A high-level representation of neuronal activity that contains sufficient information to model a given set of behaviours and their dynamics.
Let vector X t represent the neuronal state at time t.We wish to learn a mapping τ : X t → Y t where Y t is the desired cognitive state representation at time t.Typically, we want the Y -level to be a lower-dimensional and coarser representation of the X-level.Let T X and T Y be transition models at the neuronal and cognitive level respectively.
Our working definition requires the cognitive level to preserve dynamical information.This ensures that the cognitive level is self-contained and sealedoff from fine-grained details at the neuronal level (Hofstadter, 1979).This can be achieved by requiring the following diagram to commute: i.e. it should not make a difference if we start with X t and first apply τ and then T Y or first apply T X and then τ.
Aside from preserving dynamical information, we require τ to preserve behavioural information so that our abstraction is useful for modelling a specific set of behaviours.At the same time, τ should ideally discard information that is irrelevant to the behaviour so as to keep the representation as succinct as possible.A simple autoencoder framework would be inadequate, since it is based on state-reconstruction and would try to preserve details that are irrelevant to behavioural dynamics (Zhang, McAllister, Calandra, Gal, & Levine, 2021).
Architecture for learning representations  L B (cross-entropy).This ensures that Y t contains the same amount of information about B t as X t .Both loss terms are weighted by a hyperparameter γ and total loss is given by,

Representation learning on neuronal data
We apply our architecture to learn representations on neuronal data from the nematode C. elegans which consists of time-series recording of 109 neurons (Kato et al., 2015).The behavioural data is a time-series of human-annotated behaviours that denote the motor state of the worm.The dimensionality of the latent space was chosen to be 3 for ease of visualisation.
To evaluate our algorithm, we compare it with typically-used representation learning methods in neuroscience, such as PCA, autoencoder, and an autoregressor implemented with an autoencoder architecture (ArAe).The autoencoder was trained with a standard reconstruction loss, while the ArAe attempts to reconstruct X t+1 from X t .In Figure 2, we observe that all representations capture the recurrent nature of the dynamics.For both PCA and the autoencoder however, there is a drift that drags out the dynamics in an arbitrary direction thus mapping every sample to a different point in state space.In the representations from ArAe and BunDLe Net, coarsegraining occurs in a truer sense i.e. neuronal-level information irrelevant to the behaviour is discarded.
In BunDLe Net's embedding, we observe branching orbitlike trajectories.Note that the learned dynamics is largely deterministic along a give branch.It is only at the bifurcation points where stochasticity is seen to emerge.These bifurcation points may be interpreted as where probabilistic decisions at the cognitive level are made.Thus, the algorithm distils out the stochasticity and confines it to local regions in state space.This is a direct result of requiring dynamical information (that is behaviourally relevant) to be preserved.Owing to this, the learned representation is visually interpretable and reveals a structured nature of the cognitive state space.

Discussion
In this work, we have presented a generic architecture to learn cognitive state representations from neuronal data based on 1 These layers are not restricted to typical ANN layers but can be implemented by any transition model including variational layers

Figure 1 :
Figure 1: Generic architecture for learning neuronal state representations with behavioural decoding

Figure 2 :Figure 3 :
Figure 2: Dynamics of the neuronal data in the 3-dimensional latent space learned by various algorithms