Velocity coupling of grid cell modules: stable embedding of a low dimensional variable in a high dimensional neural attractor

Grid cells in the medial entorhinal cortex (mEC) encode position using a distributed representation across multiple neural populations (modules), each possessing a distinct spatial scale. The modular structure of the representation confers the grid cell neural code with large capacity. Yet, the modularity poses significant challenges for the neural circuitry that maintains the representation, and updates it based on self motion. Small incompatible drifts in different modules, driven by noise, can rapidly lead to large, abrupt shifts in the represented position, resulting in catastrophic readout errors. Here we propose a theoretical model of coupled modules. The coupling suppresses incompatible drifts, allowing for a stable embedding of a two dimensional variable (position) in a higher dimensional neural attractor, while preserving the large capacity. We propose that coupling of this type may be implemented by recurrent synaptic connectivity within the mEC with a relatively simple and biologically plausible structure.


Introduction
Much of the research on neural coding in the brain is focused on sensory representations, which are driven by external inputs that can be precisely controlled experimentally. In comparison, less is known about neural coding in deep brain structures, in which the dynamics reflect the outcome of internal computations. A notable exception is the hippocampal formation, where neural activity has been linked to high level cognitive variables such as an animal's estimate of its position within its environment (O'Keefe and Nadel, 1978;Moser et al., 2008;Taube et al., 1990), or its estimate of elapsed time within a trial of a trained behavioral task (Manns et al., 2007;Eichenbaum, 2014).
Specifically, the representation of position by grid cells (Hafting et al., 2005) in the medial entorhinal cortex (mEC) has led to new, unexpected insights on the neural coding of such quantities: even though position is a low dimensional variable, it is jointly encoded by several distinct populations of cells (modules -Stensola et al. (2012)), exhibiting periodic spatial responses with varying spatial scales. The spatial responses of all grid cells within a module are characterized by the same grid spacing and orientation, while differing from each other only by a rigid translation. The representation of position by each module is ambiguous about position, but taken together, the joint activity in several modules constitutes a highly efficient and unambiguous neural code (see Burak (2014)). Due to its distributed organization, the grid cell code possesses a high dynamic range, greatly exceeding the performance of unimodal coding schemes such as the representation of position by place cells in the hippocampus (Fiete et al., 2008;Sreenivasan and Fiete, 2011;Mathis et al., 2012;Wei et al., 2015;Mosheiff et al., 2017).
Alongside the potential benefits arising from the combinatorial nature of the grid cell code, the distributed representation of position over several modules and spatial scales poses a mechanistic challenge to the underlying neural circuitry. The difficulty lies in the hypothesized role of the hippocampal formation, and specifically the mEC, in maintenance of memory and spatial computation, as opposed to pure representation. When grid cells update their activity, e.g., based on self motion, they must do so in a coordinated manner, in order for them to coherently represent a position in two dimensional space -a variable of much lower dimensionality than the joint activity of all cells.
Neurons within a module maintain strict relationships in their joint activity (Yoon et al., 2013). These relationships are maintained across environments (Fyhn et al., 2007); under abrupt distortions of the environment; in novel environments, in which stable place fields are absent; under genetic perturbations that disrupt the spatial periodicity in the response of individual cells (Allen et al., 2014); and in sleep (Trettel et al., 2019;Gardner et al., 2019). The rigidity of the correlation structure strongly suggests that the neural activity within a module is tightly coordinated by recurrent connectivity, consistent with attractor models of grid cell activity Fuhs and Touretzky, 2006;Guanella et al., 2007;Burak and Fiete, 2009), which propose that synaptic connectivity restricts the joint activity within a module to lie on a two dimensional manifold. Additional support for attractor models has been recently obtained by imaging activity of multiple grid cells using calcium imaging in rats running on a virtual one dimensional track (Heys et al., 2014;Gu et al., 2018). These studies revealed a relationship between position on the cortical sheet and the preferred firing locations of grid cells, as predicted by the variants of attractor models that rely on distance dependent connectivity.
In contrast to the strong correlation in the activity of neurons within a module, much less is known about coupling of neurons that belong to different modules. A network of grid cells organized in modules, each independently structured as a two dimensional continuous attractor, possesses a 2 dimensional space of accessible steady states. Yet at any given time, continuous motion of the animal corresponds to a two dimensional subspace of the possible local changes in the state of the modules. Considering that noise may corrupt the representation of position in each module separately, the maintenance of a coherent representation of position across modules necessitates some form of coupling between them (Welinder et al., 2008;Sreenivasan and Fiete, 2011;Burak, 2014). Fig. 1A demonstrates the need for this coupling: incoherent drifts in the positions represented by different modules, accrued due to noise, can rapidly produce a joint representation of position that is incompatible with any position in the close vicinity of the animal. The desired coupling across modules is more subtle than the one observed within a module: the coupling should restrict changes in the states of different modules to lie within the two-dimensional sub-space that corresponds to smooth movement of the animal within its local environment. However, to preserve the high dynamic range of the code, the coupling should not restrict the dimensionality of the accessible steady states.
To further illustrate this point, it is instructive to consider an analogy of grid cell coding to the representation of a one dimensional position by the rotation angles 1,2 of two meshing gears (Fig. 1B). We imagine that motion along the one dimensional axis corresponds to coordinated rotation of the two gears (Fig. 1B, bottom). If the radii 1,2 of the two gears are incommensurate, a large distance is traversed before the two meshing gears come close to a previously visited state, thus allowing for a large range of positions to be unambiguously represented. However, it is crucial in this scheme that during continuous motion, the gears rotate in a coordinated manner:̇ 1 1 =̇ 2 2 . This relationship between the phase velocitieṡ 1 anḋ 2 is enforced by the meshing cogs along the circumference of the two gears. In the absence of this mechanical constraint, small movement of one gear relative to the other can abruptly transport the represented position to a distant location, unrelated to the original position. Note that the absolute angles of the two meshing gears are not constrained: in fact, the large capacity of the representation relies on the fact that any combination of the two angles is accessible.
Motivated by this analogy, we ask whether synaptic connectivity between grid cell modules can enforce a similar dynamic relationship between their states. Below, we identify a simple form of synaptic connectivity between grid cells that can implement this desired form of coupling. Next, we show that the recurrent connectivity confers the joint representation of position with resilience to two types of noise: First, noise in the velocity inputs projecting to different modules. These may differ in different modules, for example, due to synaptic noise. Second, noise arising from the stochastic dynamics of spiking neural activity within each module. The outcome is a continuous-attractor representation of position that achieves two goals: First, the representation is distributed across several modules with different spatial scales, allowing for combinatorial capacity by preserving the high dimensionality of the accessible steady states. Second, the neural circuitry that supports this representation is highly robust to noise when updating the representation based on self-motion, or while maintaining a persistent representation in short-term memory.
Alongside the recurrent connectivity, it is plausible that feedforward synaptic projections from the hippocampus to the entorhinal cortex play a role in shaping the grid cell response (Kropff and Treves, 2008;Dordek et al., 2016;D'Albis and Kempter, 2017;Weber and Sprekeler, 2018). Thus, hippocampal inputs may aid in coupling the states of different grid cell modules Illustration of the detrimental consequences arising from uncoupled module drifts. Black dot: location of a static animal. Top panels: schematic representation of the decoded position from the neural activity in module 1 (left panel) and module 2 (middle panel) at time = 0 . The shaded areas (cyan, purple) represent locations whose likelihood, given the neural activity, is high. Top right panel: overlay of the likelihood read out from module 1 and module 2. The maximal likelihood location, based on activity in both modules, coincides with the animal's position. Bottom panels: decoded position based on the neural activity at time 1 . Due to independent, noise driven drifts in each module, activity in module 1 represents positions that are slightly shifted to the left (bottom left), and activity in module 2 represents position that are slightly shifted upward (middle). Even though the shifts are small, the joint activity in both modules (bottom right) now represents a new maximum likelihood location (yellow), far away from the true location (black). We refer to such events as catastrophic readout errors. B. Representation of position along a one dimensional axis (black line) by the rotation angles 1,2 of two meshing gears that rotate in a coordinated manner in response to motion. The angles 1 and 2 , corresponding to each position, are shown along the blue and red lines. If the radii 1,2 are incommensurate, a large range of positions can be unambiguously read out from the combination of the two angles. (Welinder et al., 2008;Sreenivasan and Fiete, 2011). In addition, sensory inputs that carry information about the animal's absolute position may contribute as well, through synaptic projections to the mEC from other cortical areas. However, there are situations in which these types of inputs to the mEC cannot ensure appropriate coordination between grid cells modules. First, under conditions in which sensory inputs are absent or weak, the brain must rely on idiothetic path integration in order to update its estimate of position. Second, in novel environments, and following global remapping in the hippocampus (Muller and Kubie, 1987), it is highly unlikely that specific connections between place cells and grid cells, that couple the two spatial representations, are immediately established. Hence, coupling modules via hippocampal inputs would be ineffective in a novel environment. Thus, in this study we focus on the ability of recurrent connectivity within the entorhinal cortex to maintain a coherent representation of position across grid-cell modules.

Theoretical framework
In laying out the principles underlying our proposed synaptic connectivity, we consider first a one dimensional analogue of the grid cell representation, inspired by the analogy to meshing gears discussed above: we imagine that an animal moves in one dimension, and neurons in each grid cell module jointly represent the modulus of position relative to the grid spacing (for simplicity, from here on, we define the phases such that they are in the range [0, 1]): We hypothesize that the joint dynamics of all grid cells within a module are restricted to lie in a one dimensional attractor, which we model as a ring attractor (Ben-Yishai et al., 1995;Zhang, 1996). More specifically, we consider the double-ring architecture ( (2) the vector of synaptic activations, where ⃗ and ⃗ represent the synaptic activation of the right and left sub-populations respectively. The synaptic activation of neuron follows the dynamics:̇ where is the synaptic time constant, is the firing rate of neuron , is a nonlinear transfer function, is the connectivity matrix (Eqs. 8-9), 0 is a constant input, and + (− ) is the velocity input to a neuron in the right (left) sub-population. Synaptic weights projecting from neurons in each ring are shifted clockwise (right) or anti-clockwise (left). When both subpopulations receive identical feed-forward inputs, activity in the network settles on a stationary bump of activity. However, selective activation of the right (left) sub-population via the feedforward inputs, induces clockwise (anti-clockwise) motion of the activity bump at a phase velocity proportional to the velocity input . Hence, in a noise-free network, the position of the activity bump is an integral of the velocity input. True phase as a function of time (black) in response to an external velocity input, representing a simulated trajectory, and an estimation of this phase from the velocity approximation 1 (magenta). Note that is periodic with period 1, but for presentation clarity we unwrap to depict a continuous path along the real axis. C. The coupling of drifts in two modules is achieved by providing the velocity approximation 1 as a velocity input to module 2 (and vice versa, not shown). Each module is modelled as a double ring attractor, as in (A). D. Two coupled modules. The velocity input of each module has three contributions: The external velocity input (purple), the coupling of velocity from the other module (green), and the self coupling (orange).
Our goal is to couple several such modules such that they will update their states in a coordinated manner in the presence of noisy inputs. It is essential to couple the modules based on their drift velocities and not by their absolute phases, as we want to enable all phase combinations of the different modules to be possible steady states of the population neural dynamics. Our proposed coupling requires two ingredients: reading out the phase velocity of each module, and inducing corresponding phase velocities in the other modules. The double ring model already contains a mechanism for integration of velocity inputs, and therefore our main challenge is in reading out the phases velocities.

Simple neural readout of velocity
Our first goal is to read out the phase velocity of a single module in our system. It is possible to compute the drift velocity by projecting Eq. 3 on the eigenvector with zero eigenvalue of the dynamics (see Appendix 1 and Burak and Fiete (2012)). However, this projection cannot be evaluated linearly from the neural activity, since the projection coefficients depend on the location of the activity bump. Instead, we seek a simple estimate of the drift velocity, that can be implemented in a neural circuit with relatively simple architecture in a biologically plausible manner.
Intuitively, in the described framework, most of the motion arises from the differences in activity between the right and left sub-populations. Therefore, this difference might be close to the drift velocitẏ . We find, indeed, that the difference between the synaptic activities of the right and left sub-populations, provides a good approximation for the phase velocity (Fig. 2B), where is a proportionality factor. In Appendix 1 we show mathematically that ≈̇ .

Coupling modules by synaptic connectivity
In order to couple the motion of different modules, we use the readout signal of each module (Eq. 4) as a velocity input to all other modules ( Fig. 2C-D, green arrows). In addition, we include negative self coupling within each module using the same readout signal (necessary, as shown below, in order to prevent instabilities that otherwise arise from the positive feedback generated by the inter-module couplings), Fig. 2D (orange arrows).
Note that is a linear function of synaptic activities within the ring network, with coefficients that do not depend on the position of the activity bump. Thus, the coupling can be implemented by recurrent connectivity within the mEC, between modules and within a single module.
To understand how the couplings influence the joint dynamics of the coupled modules, we analyze the response of a network, consisting of coupled modules, to external velocity inputs, ⃗ ( ). The position of the bump in each module can be represented by a phase . We find that the dynamics of these phases are governed by the following set of coupled differential equationṡ⃗ where is an × matrix whose element represents the coupling strength from module to module , * ̇⃗ is the convolution oḟ⃗ with an exponential filter with the synaptic time scale (Eq. 36), and is a constant factor (see full derivation of Eq. 5 in Appendix 2). Thus, the phase of each module is updated in response to two signals: the external velocity input projecting into the module (first term in Eq. 5), and the recent history of changes in the phases of the other modules, conveyed by the coupling signal (second term in Eq. 5). Much of the system's response to external velocity inputs can be understood by considering its dynamics under the assumption that the motion of the animal is sufficiently slow, such that the components oḟ⃗ vary slowly compared to the synaptic time constant. Under this assumption, we obtain from Eq. (5)̇⃗ where is the linear response tensor.
For simplicity, let us consider first only two coupled modules (each of them one dimensional), with identical self coupling strength for both modules. The eigenvalues of , denoted by + and − (Fig. 3A-C and Appendix 2), indicate how strongly the coupled modules respond to velocity inputs that drive coordinated and relative motion, respectively. If − is small, the modules respond weakly to velocity inputs that attempt to update the phases in an uncoordinated manner. Thus, if − is much smaller than + , we expect the motion of the two modules to remain coordinated, even if the velocity inputs to the two modules differ.
We choose coupling parameters 1,2, such that three requirements are fulfilled (see Appendix 2): First, the modules should respond significantly to inputs that drive coordinated motion (large + , Fig. 3A,C). The response to such inputs should not be suppressed since the system must be able to update its state based on velocity inputs, to correctly represent the animal's position in its environment. Second, the modules should respond very weakly to inputs that drive anti-correlated motion (small relative motion − , Fig. 3B,C). The self negative coupling enables us to achieve these two requirements while preserving stability ( Fig. 3A-C and Appendices 2-3). Our last requirement is the maintenance of a specific ratio between the module phase velocities, , that corresponds to the grid spacing ratio between successive modules (we set = √ 2 for all modules Stensola et al. (2012)). Fig. 3E-F demonstrates the response of two modules to an external velocity input, representing an animal's trajectory (shown in Fig. 3D). The input is given only to module 1. In the the case of uncoupled modules ( 1,2, = 0) only module 1 follows the trajectory, as expected (Fig. 3E). In the case of coupled modules, both of the modules follow the trajectory quite accurately, with the desired phase velocity ratio . Hence, under these conditions, the two coupled modules shift in a coordinated manner, even if they receive incompatible velocity inputs. Next, we generalize these results to multiple modules, and to grid cells in two dimensions.

Generalization to two dimensions and several modules
The coupling of modules, described so far, can be easily extended to grid cell responses in two dimensions. In accordance with grid cell responses in two dimensional arenas, each module is structured as a two dimensional attractor, whose state is determined by two periodic phases. Thus, the steady states of the attractor are arranged on a torus instead of a ring. To obtain this topology, we use a network architecture in which neurons are arranged on a parallelogram, corresponding to a unit cell of the hexagonal grid (in similarity to Guanella et al. (2007)). The synaptic connection between any two neurons depends on their distance on the parallelogram, defined using periodic boundary conditions ( Fig. 4A and Methods). The position of the activity bump must be able to shift in response to a two-dimensional velocity input, in any direction in the plane. To implement this velocity response, each module contains four sub-populations (right, left, up, and down). The synaptic weights projecting from neurons in each sub-population are shifted in a corresponding direction within the neural population (Burak and Fiete, 2009).
In addition, we generalize our network to grid cell modules. The coupling strengths thus comprise 2 parameters that we are free to adjust to fulfill a set of requirements, similar to those applied in the case = 2. Our most important goal is that the motion of Velocity response of the coupled system to velocity inputs that drive joint motion + (A) or relative motion − (B) in two coupled modules, as a function of the coupling strengths. C. ± is a function of one parameter that depends on the coupling strengths ( ± √ 1 2 ). The magenta and black circles represent the parameters used in (F), and the corresponding value of + and − . D. Simulated trajectory, whose derivative is injected as a velocity input only to module 1 in panels (E-F). E. Response of two uncoupled modules ( 1,2, = 0): the position represented by module 1 tracks the velocity inputs, module 2 is unresponsive, and the updates in the two modules are not coordinated, as expected. F. Same as (E) but the modules are coupled with coupling strengths: = −20, all modules should be coordinated, even if the velocity inputs are not identical. To achieve this goal, we define a joint motion vector ⃗ , such that ∕ is the ratio of grid spacings of modules and . We require that this vector is an eigenvector of the linear response tensor, and minimize the eigenvalues corresponding to all other eigenvectors. If we were able to obtain eigenvalues that precisely vanish, the response tensor would be a rank one matrix whose columns are all proportional to ⃗ . Under this idealized outcome, any velocity input, regardless of its direction in the 2 dimensional input space would result in coordinated motion of the modules. However, the couplings that precisely achieve this goal diverge, in similarity to the two-module case (Appendix 2). Thus, we impose a constraint on the strength of the synaptic connections. Another constraint has to do with the stability of the response to dynamic velocity inputs. We optimize an appropriate target function under these constraints (Appendix 3).
For = 2 the optimization results in the same solution of coupling parameters that we found previously. For > 2 we find that there is considerable freedom in choosing combinations of that achieve satisfactory coupling (See Appendix 3). One principled way to reduce this freedom, is to require that there is connectivity only between successive modules. This choice is compatible with recent observations (Fu et al., 2018) that excitatory synaptic connectivity within the mEC is relatively short ranged. In our numerical results, we use this assumption to constrain the structure of the connectivity matrix , but other choices that include broader connectivity between modules lead to similar coupling between the modules.
To demonstrate how our proposed coupling affects the response of the modules to velocity inputs, we simulate the described network in two dimensions, with three modules. The velocity input is a measured rat trajectory from Stensola et al. (2012), with the addition of white Gaussian noise, drawn independently in the three modules. In a noise-free simulation, the single cell firing rates form a hexagonal grid pattern as a function of the animal's location (Fig.  4B), as expected from the network structure, while the spacing ratio between modules is close to .
The trajectories of the 2d phases, in response to noisy velocity inputs, are shown, for each of the three modules, in Fig. 4D (uncoupled modules) and Fig. 4E (coupled modules). In both cases, the phases follow the animal's trajectory ( Fig 4C) quite closely, but the phases are much more similar to each other, and to the original trajectory, in the coupled case. The lack of deviations between the phase trajectories, seen in the coupled case, is an essential difference between the dynamics of the coupled and uncoupled modules. As discussed in the Introduction, we expect this difference to strongly impact the stability of the grid cell code. In the following section we substantiate this point.

Consequences for spatial representation and readout
We next aim to validate our hypothesis that the coupling of modules stabilizes the grid cell code, and more specifically, prevents catastrophic errors that can be caused by uncoupled drift in the phases of different modules (Fig. 1A). We simulate the dynamics of our three module network with noisy velocity inputs based on a measured rat trajectory from Stensola et al. (2012), as in Fig. 4. We then generate Poisson spikes from the instantaneous firing rates of the  2012)), whose derivative is injected as a velocity input to all modules in panels (D-E), with addition of uncorrelated noise in each module. D. Response of three uncoupled modules. E. Response of three coupled modules. The phases of the three modules approximately track the velocity inputs in both cases, but the coordination between phases of the three modules is more tight in (E).
neurons, and read out the animal's trajectory from the simulated spikes: we do so both for coupled modules (Fig. 5A) and for uncoupled modules (Fig. 5B). The readout is accomplished using a decoder that sums spikes from the recent history, with an exponentially decaying temporal kernel (see Methods and Mosheiff et al. (2017)).
In the case of coupled modules the decoded trajectory is similar to the input (Fig. 5A), but due to the noise in the inputs, it gradually accrues an error relative to the true trajectory. Without coupling, the position read out from the network activity diverges sharply from the true trajectory (Fig. 5B). Moreover, the readout trajectory is often discontinuous in time, and thus cannot be a good approximation to any reasonable path of the animal. The discontinuity arises from uncorrelated drifts of the modules which, combined with the periodic nature of the grid pattern, cause catastrophic readout errors, much larger than the errors accrued in the phases of each module separately (Fig. 1A).
In order to quantitatively substantiate the relationship between the large deviation of the decoded trajectory from the true trajectory and the occurrence of catastrophic readout errors, we repeat the decoding process a hundred times with and without coupling, for a 20 s input trajectory. In all realizations with coupling, the readout is coordinated with the input trajectory (Fig. 5C, blue). In contrast, without coupling almost all realizations exhibit discontinuities within a time interval of 20 s (Fig. 5C, red). The mean square error (MSE) of the decoder increases as a function of time in the coupled as well as the uncoupled systems (Fig. 5D), as expected due to noise in the input (as the coupling and inputs are of velocity and not absolute location, there is no correcting mechanism that can correct coordinated shifts in the phases of all the modules). However, a few seconds after the start of the simulation, the MSE grows sharply in the uncoupled system (dashed line in Fig. 5D). The time at which this starts to happen coincides with the first appearance of discontinuities in the decoded position (compare Fig.  5 panels C and D). Thus, the dramatic reduction achieved by the coupling between modules arises primarily from the elimination of catastrophic readout errors.

Intrinsic neural noise
Up to this point we presented a theory of several grid cell modules, coupled to each other by synaptic connectivity within the mEC, such that the coupling significantly suppresses incompatible drifts caused by noisy inputs to the system. Next, we wish to address another important source of noise, arising from the variability in the spiking of individual neurons within the grid cell network (Softky and Koch, 1993;Shadlen and Newsome, 1994;Burak and Fiete, 2009). In similarity to noise in the inputs, stochasticity of the neurons participating in the attractor network drives errors that accumulate over time with diffusive dynamics (Burak and Fiete,  2009, 2012). To model this process, we replace the firing rate of each neuron in Eq. 3 by a Poisson spike train (see Eq. 22).
Since we designed our network to be resilient to noisy inputs, it is not obvious that the same architecture can also provide resilience to intrinsic noise. To address this question, we revisit first the simple case of two coupled modules in one dimension. In Appendix 1 we show that the simple readout of velocity used to couple the modules (Eq. 4), is a good approximation for the We repeated the decoding of the animal's trajectory, as in (A-B), over a hundred simulations with different realizations of the noise. A success at time is defined as a trial that did not contain any discontinuity in the readout up to that time. The success percentage was computed by counting the number of trials without discontinuities at each time point. The coupled network maintains 100 % success over time (blue), whereas the success percentage of the uncoupled network decreases significantly over time: many trials contain discontinuities after a few seconds, and almost all of them contain discontinuities after 20 s (red). D. Mean square error (MSE) of the decoded trajectory, computed from the same simulations presented in (C), in the coupled (blue) and uncoupled (red) cases. The vertical black dashed line in (C-D) represents the first time in which a discontinuity was observed in any of the trials of the uncoupled network. From this time point onward, the percentage of success of the uncoupled network descends, and the MSE sharply increases (note the logarithmic vertical scale). drift velocity driven by intrinsic neural noise, suggesting that the coupling introduced previously can help suppress uncoordinated drifts. To quantify the impact of coupling on coordination of the modules, we compute the diffusion tensor of their phases, using Eq. 25 (the calculation is based on the theoretical framework laid out in Burak and Fiete (2012), see specifically Eq. S24). In the uncoupled case the diffusion tensor is isotropic as expected (Fig. 6A, blue line). When the modules are coupled, with the same coupling strengths as in Fig. 3, the diffusion of the two modules is highly anisotropic (Fig. 6A). The first principal axis of the diffusion tensor (red ellipse in Fig. 6A) closely matches the direction of coordinated motion (dashed line in Fig. 6A). The diffusion coefficient − , associated with motion in the orthogonal direction, is much smaller than the diffusion coefficient + , associated with coordinated motion: + ∕ − ∼ + ∕ − ∼ 40 (compare Figures 3C and 6A). Thus, the coupling strongly suppresses incompatible diffusion of the two modules.
Next, we evaluate the consequences for representation and readout, arising from the suppression of incompatible diffusion arising from intrinsic neural noise. We repeat the simulation of three coupled modules in two dimensions, this time with stochastic (Poisson) neurons. Discontinuities in the decoded trajectory occur both in the uncoupled and coupled networks, but they are much more rare in the coupled network (Fig. 6B). Accordingly, the readout MSE is reduced dramatically in the coupled network (Fig. 6C, note the logarithmic scale). Thus, the coupling is effective not only in stabilizing the neural representation in response to noisy inputs, but also with respect to internal stochasticity within the grid cell network.
In principle, one could seek coupling parameters such that diffusion would be suppressed in all directions. However, recall our requirement (1) from section Coupling modules by synaptic connectivity, that the network must respond with sufficient gain to external inputs, to follow an animal's trajectory. As we keep the joint response strong, we cannot reduce the joint diffusion simultaneously, and we are satisfied with coupling of the diffusive drift, without eliminating coordinated diffusion.

Previous works (Fiete et al., 2008; Mathis et al., 2012; Wei et al., 2015; Mosheiff et al., 2017)
have shown that grid cell activity, viewed as a neural code for position, achieves a high dynamic range (ratio of capacity and resolution Burak (2014)) due to the splitting of the representation across multiple modules. In this work we addressed a key difficulty with this idea: the combinatorial nature of the representation, arising from the existence of multiple modules, leads to high vulnerability to noise. Small uncoordinated errors in the phases of the different modules can shift the represented position to a far away location. As a possible solution to this difficulty, we proposed a simple architecture of synaptic connectivity between grid cell modules, that can suppress incompatible drifts. The functional coupling between modules, arising from our proposed synaptic connectivity involves velocities, not absolute phases. Consequently, the coupling does not limit the combinations of possible phases of the different modules, and thus does not affect the capacity of the code.
Similar principles may apply to storage in working memory and coding of other continuous, low dimensional variables in the brain. Thus, the main contribution of our work from the theoretical perspective, is that it identifies a way to couple several low dimensional continuous attractors of dimension (in the case of grid cells, = 2), to produce a persistent neural representation of a single, dimensional variable with high dynamic range. The dynamics of the network are characterized by two seemingly contradictory features: first, the steady states of the system span a space of dimension , where is the number of modules. Second, during maintenance and continuous update of the stored memory, the joint state of the modules is dynamically restricted to lie in a much smaller, -dimensional subspace. This enables the continuous embedding of a dimensional variable in the larger, dimensional space, without allowing for noise to shift the state of the system outside the appropriate, dimensional local subspace.
Our proposed mechanism for coupling modules is complementary to another possible mechanism, of coupling grid cell modules through the reciprocal synaptic connectivity between the entorhinal cortex and the hippocampus (Welinder et al., 2008; Sreenivasan and Fiete, 2011; Burak, 2014). Since biological systems often harness multiple mechanisms to achieve the same function, both mechanisms might act in parallel to stabilize the grid cell code against catastrophic readout errors. As discussed in the introduction, it is highly unlikely that coupling via the hippocampus could work in a novel environment, following global remapping. On the other hand, it is of particular importance for the brain to establish a geometric representation of position, aided by idiothetic path integration, under this scenario. Thus, the velocity coupling mechanism proposed in this work may play an especially important role in generating a cognitive map of a novel environment.
A model that involves synaptic coupling between modules, of a different architecture than the one considered here, has been recently proposed in Kang and Balasubramanian (2018). This model does not explore the consequences of noise on coding stability, and its primary goal is to explain the ratios between grid spacings. Hence, Kang and Balasubramanian (2018) address different questions from those studied in the present work. Nevertheless, it is plausible that the synaptic connectivity proposed by Kang and Balasubramanian stabilizes the dynamics against incompatible motion of the modules. An important difference between the network architecture explored in Kang and Balasubramanian (2018) and the architecture explored here, is that we consider all to all connectivity between grid cells, which is invariant to any static, relative shift in the module phases. Hence, all combinations of phases are steady states of the dynamics. In contrast, the synaptic connectivity considered in Kang and Balasubramanian (2018) is spatially local. Consequently, it tends to produce interlocked patterns of activity in adjacent modules, with shared spatial periodicity, and preferred relative spatial phases. These properties of the activity patterns are expected to limit the representational capacity of the code. Here we addressed a different computational goal, of stabilizing a distributed representation of position over multiple modules, without compromising the dynamic range of the neural coding scheme.
Our focus in this work was on the suppression of relative motion across modules, but noise in the inputs, or in the intrinsic activity within the network, drives also coordinated motion. It is possible to suppress the latter type of random motion by increasing the negative feedback in the system (Fig. 3A,C). In choosing our optimization goal for the coupling parameters, we did not attempt to suppress coordinated drift for two reasons. First, coordinated drift is much less detrimental from the coding perspective than relative drifts, as discussed in the introduction. Second, suppressing the coordinated motion comes with an inevitable cost: a reduction in the gain of the system's velocity response. Nevertheless, it is interesting to consider also the suppression of coordinated drift. Next, we briefly discuss the possible implementation of this goal.
For simplicity, consider a single one dimensional module, structured as a ring attractor: in this situation, there is only coordinated motion. As in any continuous attractor network, stochasticity of neural activity within the network introduces random drift (Burak and Fiete,  2012). In the double ring architecture (Xie et al., 2002), much of the drift arises from fluctuations in the difference of activity between the two sub-populations that drive left and right motion. Using the negative self-coupling of velocities, introduced in this work, it is possible to suppress these fluctuations and substantially reduce the noise-driven drift. It is interesting to compare this mechanism with another proposal (Lim and Goldman, 2014) for stabilization of the memory stored in a single ring attractor, using negative derivative feedback (Lim and Goldman, 2013). In Lim and Goldman (2014) the stabilization slows down the dynamics of all neurons in the network, thereby slowing down the relaxation of any deformation in the shape of the activity bump -not only the position of the activity bump on the ring attractor. In contrast, within the architecture considered here, the unimodal shape of the activity bump is maintained, while the velocity feedback mechanism slows down only noise driven diffusion of its position. Thus, the velocity coupling mechanism identified in this work may be relevant to the stabilization of short-term memory in head directions cells of rodents (Taube, 2007) and insects (Seelig and Jayaraman, 2015), where there is no evidence for slowly decaying deformations in the shape of the activity bump.

Experimental predictions
The grid spacing of a single module is determined by the velocity input strength (Eq. 21): larger values of lead to smaller grid spacing. Thus, in an uncoupled network the grid spacing ratios are determined by the input strengths . However, in the network of coupled modules the spacing ratios are determined primarily by the inter-module coupling parameters. Each velocity input strength affects all the grid spacings, but has little effect on the spacing ratios. For example, even if the s are identical in all modules, or if only one module receives a velocity input, all modules shift their states with a velocity ratio that matches the desired grid spacing ratio. An interesting prediction arises under a scenario in which one of the modules is disconnected from the others. This removes positive couplings from the other modules, but leaves the negative self coupling within the module intact. Hence, the disconnected module is expected to weaken its response to velocity inputs, and increase its grid spacing. Similarly, other grid spacings, of modules that were originally connected to the disconnected module, are expected to increase as well.
The joint activity of grid cells can be expected to lie within a two dimensional space when salient sensory cues are available to the animal, regardless of the existence of an inter-module coupling mechanism. The existence of a coupling mechanism must therefore be tested under conditions in which external sensory cues are weak (Burak, 2014). It is instructive to compare this goal with what has been learned about population activity within a single module (Yoon  et al., 2013; Fyhn et al., 2007; Allen et al., 2014; Trettel et al., 2019; Gardner et al., 2019). In that context, simultaneous recordings from pairs of grid cells were highly informative, since grid cells from a single module exhibit strong correlations (or anti-correlations) in their joint spiking activity. The preservation of these correlations, under conditions in which the animal's sense of position is disrupted, supports an interpretation that the correlations are maintained by recurrent connectivity within each module. In contrast, cells belonging to different modules are expected to fire together in some positions in space, and refrain from firing together in other parts of the environment. Averaged over motion in a large environment, cell pairs from different modules are expected to exhibit weak correlations in their activity, even if the updates of module phases are fully coordinated.
Analysis of spike correlations in grid cells from different modules (Trettel et al., 2019;  Gardner et al., 2019) confirms this expectation. During free running, spike correlation functions of grid cells from different modules are much weaker than those observed within a module. Despite being weak, these correlations can be statistically significant. Their existence originates from the fact that in any specific environment, and especially in small enclosures, the firing fields of two grid cells with different spatial scales slightly favor correlated or uncorrelated firing, depending on the precise overlap between the spatial receptive fields of the two cells. During sleep, these weak correlations are significantly reduced (Trettel et al., 2019; Gardner et al.,  2019). A possible interpretation of this result is that there is no coupling between modules during sleep. However, an alternative explanation is that the reduction in correlations reflects an increase in the repertoire of positions and environments represented in the sleep state: regions of joint firing and regions of disjoint firing are expected to average out more evenly under such circumstances, leading to weaker correlations.
In order to test for the existence of a velocity coupling mechanism, it is desirable to test for correlations in the updates of phases of different modules, instead of testing for correlations in their absolute phases. This will require simultaneous recordings from multiple grid cells, of sufficient numbers that will enable reliable tracking of the module phases. Appropriate recordings are not yet available, but techniques that enable simultaneous monitoring of large neural populations of the mEC (Jun et al., 2017; Gu et al., 2018; Obenhaus et al., 2018) [CITE: see comment below] are likely to enable their acquisition in the coming years. In similarity to the experiments that provided insights on the low dimensionality of activity within each module, it will be necessary to test for inter-module coupling under conditions in which the animal's internal sense of position is not anchored to salient external cues.
Our model assumes that grid cells in the mEC are involved in idiothetic path integration, and harnesses ingredients from models of path integration in the grid cell system to generate the coupling between modules. It is widely hypothesized that grid cells are indeed involved in path integration (Hafting et al., 2005; McNaughton et al., 2006; Moser et al., 2008; Burak, 2014), but this involvement is not experimentally established (see, however Gil et al. (2018)). Accordingly, a specific role of any particular cell type within the mEC in idiothetic path integration is not yet identified. A specific population of cells that may provide the substrate for the connectivity proposed in our model are the conjunctive cells observed mostly in layer III and deeper layers of the mEC (Sargolini et al., 2006), which play a pivotal role in models of path integration in the grid cell system. We note that these cells are tuned to head direction more closely than heading (Raudies et al., 2015), a difficulty that faces all models of path integration within the mEC. The resolution of this difficulty may involve computational elements within the entorhinal circuitry that have not yet been identified. Thus, future experimental findings concerned with the mechanisms underlying path integration may call for (and enable) corresponding refinements of our model.
Very little is known about synaptic connectivity between grid cells in the mEC, especially for cells belonging functionally to different modules. An important conclusion of our work is that synaptic connectivity between different modules may be beneficial for dynamically stabilizing the grid cell representation during path integration and memory maintenance. The specific form of connectivity that we identify is appealing for several reasons: first, it involves broad, relatively unstructured connectivity between grid cells, that depends only on their preferred heading preference. A second appealing feature of our proposed architecture is that it is sufficient to couple grid cells from modules with adjacent spacings, to achieve the desired stabilization of the grid cell representation. Since there is a relationship between grid spacing and position along the dorsal-ventral axis (Hafting et al., 2005; Stensola et al., 2012), all-to-all couplings between modules would require long-range connectivity within the mEC. Recent evidence (Fu et al., 2018) hints that synaptic connections between excitatory cells in the mEC may be more limited in range, but of sufficient spatial extent to allow for coupling of adjacent modules.

Model and simulations details
The dynamics of the network are described by Eq. 3 (or by Eq. 22 for the Poisson spiking neuron case). The synaptic time constant = 10 ms, the transfer function ( ) = −1 max( , 0), and 0 = 3. All simulations were done using the Euler method for integration, with a time step = 0.1 ms.
Coupled modules Consider coupled modules. The firing rate (Eq. 3) of neuron from module is ( Fig. 2D): where is the velocity estimation of module (Eq. 4) , is the coupling strength from module to module ( is the self coupling strength of module ), and the sign ± is equal to + (−) if the neuron belongs to a right (left) sub-population. The proportionality factor is included to simplify the units of the coupling strengths . Note that ⋅ does not depend on the parameter . Thus, the choice of is of no consequence for the dynamics, and this parameter is included only for the sake of derivation convenience.
The coupling between the modules can be interpreted as arising from synaptic connectivity. This can be seen by re-writing Eq. 11 as: where the 2 × 2 coupling connectivity matrix is: In the case of = 2 we use the coupling parameters: = −20, 12 = 1 ≈ 14.14 and 21 =

Two dimensional modules
In two dimensions each module contains four sub-populations, and the synaptic activation vector is: Each sub-population contains 2 = 64 2 neurons, arranged on a parallelogram. The preferred phase of the 'th neuron in each sub-population is: where and are distributed uniformly in the interval [0, 1]. The connectivity matrix is now: where and ( ) = 4 2 exp − 2 2 2 − 1 .
The distance measure | ⋅ | P 2 is defined using periodic boundary conditions on the parallelogram (see Fig. 4A): |⃗ 1 − ⃗ 2 | 2 is the minimal distance between the two points ⃗ 1 and ⃗ 2 on the torus that is created by gluing the opposite edges of the parallelogram defined by the vertices 2 ) (Fig. 4A, compare with Eq. 10, used in the one dimensional case). The firing rate of neuron ∈ {sub-population R or L} from module is: where Firing rates of neurons from the up and down sub-populations are obtained from Eqs. 19-20 by replacing → , , → , and → . In each module, responses to horizontal and vertical velocity inputs are independent: the right and left sub-populations respond to the horizontal velocity inputs, and affect , while the up and down sub-populations respond to the vertical velocity inputs and affect . Hence, the two-dimensional response tensor separates into independent, horizontal and vertical components with the same structure as in the one dimensional case. Since the linear response tensor (in each direction) is identical to that of the one dimensional case, the coupling parameters are chosen in the same way in one and two dimensions.

External velocity input
The external velocity input to module (in two dimensions) in ∈ { , } direction is (Eq. 19): is a proportionality factor that depends on the module, ( ) is a the component of the animal's velocity in the direction, and , ( ) is a white noise process with ⟨ , ( ) , ′ ( ′ )⟩ = 2 ′ ( − ′ ). In Fig. 4B and Fig. 6B-C the external input is not noisy, so = 0. In Fig. 4D-E and Fig. 5 In Fig. 3E-F (one dimension) 1 = 0.06 and 2 = 0. In the simulations of Figs. 4-6 (two dimensions) 1 = 0.06 , 2 = 0.06 , 3 = 2 0.06. Thus, even without coupling of the modules, the spacing ratio is achieved by the ratios of the inputs strengths , and therefore we can compare between the coupled and uncoupled phases and readout .

Spiking network
In the case of spiking Poisson neurons Eq. 3 is replaced by: where is the spike train of neuron , and are the spike times. Each neuron generates spikes sampled from a Poisson distribution with a firing rate ( ), as defined in Eq. 3 (Stevens and Zador, 1996;Gerstner and Kistler, 2002;Burak and Fiete, 2012).

Decoding
Decoding of the animal's trajectory, based on spike trains, is performed in Figs. 5-6 using a decoder that sums spikes from recent history with an exponential temporal kernel (Mosheiff et al., 2017). In Fig. 5 we simulate a spike train for each neuron (Eq. 23), sampled from an inhomogeneous Poisson process with a firing rate ( ) (note that the the network dynamics are deterministic and spikes are used only in the readout process). In Fig. 6 the stochastic spike train is part of of the dynamics (Eq. 22). In both cases, the decoded location of the animal in time is (Eqs. S10, S11 and S13 in Mosheiff et al. (2017)): The summation is over all neurons. Here = 10ms for all modules. The integral in Eq. 24 yields an effective spike count of neuron , weighted in time using a decaying exponential kernel, and (⃗ ) is the receptive field of neuron at location ⃗ , measured separately from the firing rate of each neuron in the steady state of the dynamics.

Diffusion tensor
Consider a system of spiking Poisson neurons, and one dimensional modules. The internal noise introduces a diffusive drift. The network is now a single continuous attractor with dimension (see Appendix 4). Hence, the diffusion tensor is a × matrix, that can be calculated using Eq. S24 in Burak and Fiete (2012), for the dynamics of the dimensional attractor (Eqs. 73, 74 and 75 in Appendix 4): where the summation is over all neurons,̄ ( ⃗ ) is the firing rate of neuron in the steady state of the system, and ( ⃗ ) is the left null eigenvector of the dynamics (Eq. 74) corresponding to a phase shift in the direction of module (calculated numerically). Let us define the 2 dimensional vector: where is a constant, whose value is determined below. The projection of Eq. 26 on ⃗ 0 results in The second term on the right hand side of Eq. 31 vanishes due to the symmetry of ′ (̄ ) to exchange of the right and left sub-populations, the antisymmetry of 0 to this exchange, and the structure of the matrix . Using the definition of (Eq. 4), we can write: Thus, Eq. 31 becomes: In Eq. 34 we use again the symmetry ′ (̄ , ) = ′ (̄ , ). We set (Eq. 30) such that̃ = .
Using Eqs. 29 and 34: From Eq. (33) we see that the readout velocity is a convolution of ( ), with the filter Combining Eqs. 28 and 33 we conclude that Hence, is equal to the phase velocity, smoothed over a relatively short time scale set by the synaptic time constant ( = 10ms).
Next, we examine the structure of ⃗ + and ⃗ − . The left and right components of + are spatially antisymmetric with respect to the peak of the activity bump, and precisely vanish at the center of the bump (yellow trace in Appendix 1 Fig. 1B). The components of − are symmetric (blue and red traces). We note that the Poisson noise is expressed only within the relatively narrow extent of the activity bump (dashed line). Two consequences follow: first, ⃗ + ⋅ ⃗ is very small due to the nearly vanishing values of + on the activity bump. Second, the components of − are nearly constant in magnitude along the narrow extent of the bump, but have opposite signs in the left and right sub-populations. This magnitude can be evaluated by rewriting Eq. 35 as: Thus, the magnitude of − along the activity bump is approximately , and ⃗ − ⋅ ⃗ ≈ ⃗ 0 ⋅ ⃗ . Hence, Eq. 42 becomes:̇ We now project the linearized dynamics of Eq. 38 on the vector ⃗ 0 to obtain: Thus, is a convolution of the velocity phase with the exponential filer , with the synaptic time integration . Therefore: as in the case of phase velocity due to external inputs.

Linear response tensor
We next wish to understand how our system of coupled modules responds to an external velocity input ⃗ ( ). First, consider two modules, each in one dimension, that follow the dynamics of Eq. 3. The synaptic activation vector is now 4 dimensional and has the form: The firing rate of neuron from module 1 is (Fig. 2D): where the total velocity input is = 1 + 1 2 + 1 . A similar equation for the rate 2, is obtained by switching the indices 1 ↔ 2 in Eq. 48. The constant is a proportionality factor that could in principle be absorbed into the definition of the coupling strength parameters, but makes the units of the coupling strengths more convenient. Substitution of and Eq. 37 in Eq. 28 yields: 1 = 1 + 1 * ̇ 2 + * ̇ 1 .
Thus, the phase velocities are proportional to the external velocity input, plus the velocities of all other modules filtered in time, and coupled by the matrix C. If we assume that the velocity is sufficiently small, that the position can be regarded as fixed within the synaptic time scale, the convolution with the filter can be omitted (note that the integral of , Eq. 36 , is equal to unity), and we obtain: where is the linear response tensor.

Two modules
In the case of two modules and identical self coupling strength for both modules, we can easily find the eigenvalues ± and eigenvectors of . The coupling matrix in this case is: (53) Using Eq. 7 we obtain: The eigenvectors of are: Thus, the phase motion of the two modules can be represented as: where Here, ± are the joint or relative phases of the modules, respectively. We choose parameters such that + is large and − is small in order to obtain large joint motion and small relative motion between modules for any external input. Note that − ≈ 0 implies thaṫ − ≈ 0 (Eq. 56), and in this case (Eq. 57):̇ Thus, we can set the velocity ratio between modules (namely, the spacing ratio ), by the ratio of the coupling parameters, using Eq. 58. We choose We choose + = to maintain the same response to coordinated velocity inputs, as in the case of no coupling. Using Eq. 54 we see that this requires + √ 1 2 = 0, which implies and − = 1 − 2 32 of 37 To obtain small − , should be negative, with a large absolute magnitude. Note that a large positive magnitude of , although seems here as a suitable choice, would cause instability of the system (discussed in Appendix 3).
which is identical to the solution we found in Appendix 2.
Making the additional choice that only successive modules are coupled yields the solution: where there is a freedom in choosing 23 . In Figures 4,5 and 6 we chose 23 = 21 .
In principle, other constraints on the coupling strengths can be used instead of the requirement that < for all . For example, limiting the magnitude of all the coupling strengths ( 2 < ), instead of only the magnitude of the self couplings, yields solutions with smaller range of allowed parameters (in the case of multiple solutions), but otherwise identical structure. A different possible constraint is of the form ∑ , =1 2 < . This constraint results in a symmetric solution: ∝ ∑ 2 − . We prefer the constraint in Eq. 62 as we do not see a compelling reason to limit all of the coupling parameters together instead of limiting them separately.

Stability conditions
To map the stability conditions of the coupled network, we consider the dynamics of the velocity estimation ⃗ (Eq. 33). As Eq. 33 is obtained by projecting the full dynamics of the network on the vector ⃗ 0 , stable network would result in finite ⃗ . We substitute the total velocity input ⃗ = ⋅ ⃗ + ⃗ ( ) in Eq. 33 and obtain: Thus, in order to maintain the stability of ⃗ , as well as the stability of the full network, all eigenvalues of must be smaller than unity, and all eigenvalues of (Eq. 7) must be positive. and ∈ means that the th neuron belongs to module . Using this approach numerically, yields nearly identical results as Eq. 52.