Hippocampal Remapping as Learned Clustering of Experience

The place cells of the hippocampus collectively form distinct maps of each context, a process known as hippocampal remapping. Past work has asked which features of an experience determine which map is used, but no consistent answer has been reached. However, this approach has ignored the place of context identification as part of a learning process. We suggest that context identification corresponds to an unsupervised clustering problem, where the animal receives a stream of observations and must cluster them in a data-driven manner. Each cluster corresponds to a particular context, and therefore a particular hippocampal map. Formalizing context learning as a clustering problem allows us to capture a range of experimental results that have not yet been explained by a single theoretical framework. In particular, our results highlight the role that learning plays in hippocampal remapping. This model also provides novel predictions, such as the effect of variability in training.


Background
Hippocampal remapping is thought to be a neural correlate of context representation (Colgin, Moser, & Moser, 2008). Context-dependent behavior is ethologically important, and is at least partially hippocampal dependent (Holland & Bouton, 1999;Gershman, Blei, & Niv, 2010). Given this window into high-level representation of context, the field has sought to answer the question, "What aspects of an experience determine context boundaries?" Here, we take a "normative" approach to capturing the phenomenon of hippocampal remapping: we ask what computational problem the hippocampus is solving and then see what insight the constraints on the solution to that problem provide. The question of how to assign observations to contexts, and therefore how to appropriately generalize context-specific knowledge, is a difficult question. One challenge is that there are no objective context labels that the animal has access to. Rather, the animal receives a stream of unlabeled observations, which it must partition appropriately into a series of contexts. Another challenge is that there is no objective indication as to how many contexts the animal will be exposed to. Finally, there is no a priori information about what the nature (statistics) of a given context is: which features are variable and which are consistent, what range of feature values are typical, or in formal terms what distribution observations are drawn from in a given context. Fundamentally, this corresponds to an unsupervised clustering problem, where observations are data points for which one must find a partition of those data points into clusters (i.e., contexts) that minimizes the number of clusters needed while maximizing variance captured. Each cluster can be seen as a "latent state", which corresponds to a unique generative distribution that generates observations when the world is in that state.
The clustering algorithm that we will be focusing on is Bayesian nonparametric clustering (BNC) (Gershman & Blei, 2012). Other clustering methods will have qualitatively similar behaviors. The main features of BNC are that it allows for the number of clusters to be determined flexibly by the data, that clusters can have independent shapes, and that the algorithm provides probabilistic cluster assignments rather than hard assignments. These capabilities addresses the issues described above: that labels must be inferred and are not given, that the animal does not know a priori how many latent states exist, and that the animal does not know a priori the statistics of the generative distribution in each state.

Model
We model the animal as performing inference over a hidden state c given an observation y. The observation may be multi-dimensional, with each dimension corresponding to a feature of the observation, whether sensory or non-sensory, such as the color of the walls or the task that the animal is performing. The goal is to estimate the probability that each 871 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 observation y t was generated by a particular latent state c t , or equivalently what probability to assign a given list of latent state assignments c for a given list of observations y: P(c|y) ∝ P(c)P(y|c).
There are two factors that go into that probability: one is how likely the proposed partition of observations into latent states is under the Chinese restaurant process (CRP) prior (Gershman & Blei, 2012) (c ∼ CRP(α), where α is the concentration parameter that controls the expected number of states; we use α = 10 −3 ). The process defines a probability of a given assignment being assigned to a novel draw c t given an assignment of labels to previous draws c 1:t−1 , which has K unique labels, the kth of which has m k observations associated with it: Fewer latent states are more likely, and the extent to which fewer latent states are favored is a function of α.
The other factor is how consistent different observations from the same latent state are with each other. More formally, it is the likelihood of observing a set of observations from a given cluster marginalizing over all possible parameter settings of the generative distribution: We take the generative distribution F to be a Gaussian whose parameters are drawn from G 0 , a Normal-Gamma distribution.
This prior and likelihood correspond to the generative model in Fig. 1A.
One thing to re-emphasize is that the outputs of BNC are probabilistic. This is important from a normative perspective because much of the time it is impossible to ascertain with certainty whether observations are drawn from the same distribution or not. Future information may change that assessment. The representation of uncertainty about latent state assignment corresponds in an important way with the result that hippocampal maps during two experiences are almost never entirely overlapping nor entirely independent. This "partial remapping" corresponds with the need to represent the inherent uncertainty about whether different observations are actually drawn from identical distributions. We can think of the subset of place fields that remap between two observations as being related to the hypothesis that the observations are drawn from independent distributions, and the place fields that do not remap as related to the hypothesis that the observations are drawn from the same distribution (Fuhs & Touretzky, 2007).

Instructive Example of Model Function
We being with a simple example of the latent state inference of this model. We generate observations from a 2D Gaus- The peak value is controlled by α and the spread is controlled by θ 0 .
The probability of clustering novel observations with past observations changes over the course of learning. We can apply the same inference process after the model has seen 20 draws from the same distribution (cyan dots in Fig. 1B bottom). The model now has sufficient data so that the posterior probability of a novel observation being clustered with the past observations captures the shape of the Gaussian distribution from which the observations were drawn: it has high spread in feature 1 and low spread in feature 2, corresponding to the statistics of the distribution.
This example captures the trade-off of learning: it can capture the shape of the data, but requires experience in order to do so.

Learned Distinctions
One initial focus of the study of remapping was to ask which sensory aspects had "control" over the hippocampal map, but the answer was invariably that it depended on prior experience (O'Keefe & Speakman, 1987;Knierim, Kudrimoti, & Mc-Naughton, 1998;Bostock, Muller, & Kubie, 1991). One prime example of this is the question of the role of "environmental geometry" or the shape of the recording arena. The first group to record throughout the course of learning found that there was not a consistent relationship between the shape of the recording arena and inferred latent state (Lever, Wills, Cacucci, Burgess, & O'Keefe, 2002). The experiment they performed was to record place cells of rats who were alternately placed in square and circle boxes in the same location in the recording room day after day. What they found was that early in learning, there was limited remapping, indicating that the animal considered the observations to be drawn from the same latent state. Only after extensive experience did the animals remap between the two enclosures ( Fig. 2A). This indicates that latent state boundaries change with experience. These effects are hard to explain under a framework where latent state has a priori boundaries. However, it is consistent with the clustering perspective. As emphasized above, clustering allows for capturing arbitrary distinctions, but it requires sufficient data for the evidence in favor of creating a distinction to outweigh the bias against unnecessarily creating extra latent states.
We model these experiments in the following way. We take observations to be 1D for simplicity, where the single dimension is the feature along which the distinction is learned. For example, in the circle-square experiment (Lever et al., 2002), the dimension would be the shape of the enclosure. We generate observations from two gaussians with µ 1 = −0.5, µ 2 = 0.5, σ 1 = σ 2 = 0.15 (Fig. 2B). These two gaussians correspond to the two experimenter-defined conditions. We alternate drawing observations from each distribution. After each pair of draws, we ask the model what the relative probability of the hypothesis that all observations up to that point were drawn from a single latent state against the hypothesis that all observations up to that point had been drawn from two alternating latent states (Fig. 2C). Early in training, the single latent state hypothesis is more probable. Having two latent states is less probable under the CRP prior, and the observations are not yet sufficiently inconsistent with the single latent state hypothesis. In other words, the likelihood of the observations marginalized over all possible parameters of a single Gaussian is not yet sufficiently small to outweigh the CRP prior bias against adding a second latent state. Only after sufficient data has been observed are the observations sufficiently inconsistent with the single Gaussian hypothesis for its probability to decrease below the two-latent state hypothesis. In this way, the clustering perspective on remapping explains why it takes time to begin to remap between two distinct enclosures even if the distinction is apparent to the animal from the beginning: only through experience can the animal tell if observed varia-tion is consistent enough to merit creating new latent states.

Other Phenomena and Predictions
Our model also captures a wide range of other phenomena observed in the hippocampal remapping literature. The observation that hippocampal maps take time to stabilize (Frank, Stanley, & Brown, 2004;S. Leutgeb, Leutgeb, Treves, Moser, & Moser, 2004;Law, Bulkin, & Smith, 2016) can be taken as reflecting the fact that experience is required for accurate estimates of the parameters of the generative distribution for each latent state. The observation of partial remapping in response to changes in movement direction on a linear track but not in an open field (Markus et al., 1995) can be taken as reflecting the fact that inference of distinct latent states would only occur when the observations are clustered. The observation that different animals express different remapping behaviors in response to identical training (Bostock et al., 1991;Lever et al., 2002;Wills et al., 2005) can be understood if each animal has a unique setting of the α parameter controlling willingness to infer novel latent states.
The insight provided by this model leads us to several experimental predictions. We suggest that the range of behaviors observed in the different morph experiments (J. K. Leutgeb et al., 2005;Wills et al., 2005;Colgin et al., 2010) can be recreated in a single experiment by manipulating the amount of pretraining before the morph testing. We also make predictions about the role of variance in training experiences in controlling later remapping behavior.

Broader Significance
Hippocampal remapping provides a window into the animal's subjective sense of context. Our work provides a motivated framework for making sense of a range of empirical observations associated with remapping. It emphasizes the role of learning and experience in context segmentation, and highlights the trade-off between flexibility and speed of learning.
A B C Field Divergence From Lever, et al. 2002 2 States more likely 1 State more likely Figure 2: A) From Lever, et al., 2002. They compared place cell representations between alternating presentations of square and circle boxes. Field Divergence is expressed in percent and represents the fraction of place fields that remap between the two enclosures. The representations are initially similar, but diverge with learning. B) Observations (black dots) are generated from Gaussians centered at -0.5, 0.5. The model compares the posterior probability of the observations coming from 1 hypothesized latent state (red) or 2 hypothesized latent states (blue). C) The relative probability assigned to the observations coming from 1 latent state or 2 latent states is shown as a function of amount of experience. Early on, 1 latent state is more probable, whereas later 2 latent states is more probable, similar to the empirical observations.