How Informative Are Spatial CA3 Representations Established by the Dentate Gyrus?

In the mammalian hippocampus, the dentate gyrus (DG) is characterized by sparse and powerful unidirectional projections to CA3 pyramidal cells, the so-called mossy fibers. Mossy fiber synapses appear to duplicate, in terms of the information they convey, what CA3 cells already receive from entorhinal cortex layer II cells, which project both to the dentate gyrus and to CA3. Computational models of episodic memory have hypothesized that the function of the mossy fibers is to enforce a new, well separated pattern of activity onto CA3 cells, to represent a new memory, prevailing over the interference produced by the traces of older memories already stored on CA3 recurrent collateral connections. Can this hypothesis apply also to spatial representations, as described by recent neurophysiological recordings in rats? To address this issue quantitatively, we estimate the amount of information DG can impart on a new CA3 pattern of spatial activity, using both mathematical analysis and computer simulations of a simplified model. We confirm that, also in the spatial case, the observed sparse connectivity and level of activity are most appropriate for driving memory storage – and not to initiate retrieval. Surprisingly, the model also indicates that even when DG codes just for space, much of the information it passes on to CA3 acquires a non-spatial and episodic character, akin to that of a random number generator. It is suggested that further hippocampal processing is required to make full spatial use of DG inputs.


Introduction
The hippocampus presents the same organizaton across mammals, and distinct ones in reptiles and in birds. A most prominent and intriguing feature of the mammalian hippocampus is the dentate gyrus (DG). As reviewed in [1], the dentate gyrus is positioned as a sort of intermediate station in the information flow between the entorhinal cortex and the CA3 region of the hippocampus proper. Since CA3 receives also direct, perforant path connections from entorhinal cortex, the DG inputs to CA3, called mossy fibers, appear to essentially duplicate the information that CA3 can already receive directly from the source. What may be the function of such a duplication?
Within the view that the recurrent CA3 network operates as an autoassociative memory [2], [3], it has been suggested that the mossy fibers (MF) inputs are those that drive the storage of new representations, whereas the perforant path (PP) inputs relay the cue that initiates the retrieval of a previously stored representation, through attractor dynamics, due largely to recurrent connections (RC). Such a proposal is supported by a mathematical model which allows a rough estimate of the amount of information, in bits, that different inputs may impart to a new CA3 representation [4]. That model, however, is formulated in the Marr [5] framework of discrete memory states, each of which is represented by a single activity configuration or firing pattern.
Conversely, the prediction that MF inputs may be important for storage and not for retrieval has received tentative experimental support from experiments with spatial tasks, either the Morris water maze [6] or a dry maze [7]. Two-dimensional spatial representations, to be compatible with the attractor dynamics scenario, require a multiplicity of memory states, which approximate a 2D continuous manifold, isomorphic to the spatial environment to be represented. Moreover, there has to be of course a multiplicity of manifolds, to represent distinct environments with complete remapping from one to the other [8]. Attractor dynamics then occurs along the dimensions locally orthogonal to each manifold, as in the simplified ''multi-chart'' model [9], [10], whereas tangentially one expects marginal stability, allowing for small signals related to the movement of the animal, reflecting changing sensory cues as well as path integration, to displace a ''bump'' of activity on the manifold, as appropriate [9], [11].
Although the notion of a really continuous attractor manifold appears as a limit case, which can only be approximated by a network of finite size [12], [13], [14], [15], even the limit case raises the issue of how a 2D attractor manifold can be established. In the rodent hippocampus, the above theoretical suggestion and experimental evidence point at a dominant role of the dentate gyrus, but it has remained unclear how the dentate gyrus, with its MF projections to CA3, can drive the establishment not just of a discrete pattern of activity, as envisaged by [4], but of an entire spatial representation, in its full 2D glory. This paper reports the analysis of a simplified mathematical model aimed at addressing this issue in a quantitative, information theoretical fashion.
Such an analysis would have been difficult even only a few years ago, before the experimental discoveries that largely clarified, in the rodent, the nature of the spatial representations in the regions that feed into CA3. First, roughly half of the entorhinal PP inputs, those coming from layer II of the medial portion of entorhinal cortex, were found to be often in the form of grid cells, i.e. units that are activated when the animal is in one of multiple regions, arranged on a regular triangular grid [16]. Second, the sparse activity earlier described in DG granule cells [17] was found to be concentrated on cells also with multiple fields, but irregularly arranged in the environment [18]. These discoveries can now inform a simplified mathematical model, which would have earlier been based on ill-defined assumptions. Third, over the last decade neurogenesis in the adult dentate gyrus has been established as a quantitatively constrained but still significant phenomenon, stimulating novel ideas about its functional role [19]. The first and third of these phenomena will be considered in extended versions of our model, to be analysed elsewhere; here, we focus on the role of the multiple DG place fields in establishing novel CA3 representations.

A simplified mathematical model
The complete model considers the firing rate of a CA3 pyramidal cell, g i , to be determined by the firing rates fgg of other cells in CA3, which influence it through RC connections; by the firing rates fbg of DG granule cells, which feed into it through MF connections; by the firing rates fQg of layer II pyramidal cells in entorhinal cortex (medial and lateral), which project to CA3 through PP axons; and by various feedforward and feedback inhibitory units. A most important simplification is that the fine temporal dynamics, e.g. on theta and gamma time scales, is neglected altogether, so that with ''firing rate'' we mean an average over a time of order the theta period, a hundred msec or so. Very recent evidence indicates, in fact, that only one of two competing spatial representations tends to be active in CA3 within each theta period [Jezek et al, SfN abstract, 2009]. Information coding over shorter time scales would require anyway a more complex analysis, which is left to future refinements of the model.
For the different systems of connections, we assume the existence of anatomical synapses between any two cells to be represented by fixed binary matrices fc PP g,fc MF g,fc RC g taking 0 or 1 values, whereas the efficacy of those synapses to be described by matrices fJ PP g,fJ MF g,fJ RC g. Since they have been argued to have a minor influence on coding properties and storage capacity [20], consistent with the diffuse spatial firing of inhibitory interneurons [21], the effect of inhibition and of the current threshold for activating a cell are summarized into a subtractive term, of which we denote withT T the mean value across CA3 cells, and withd d i the deviation from the mean for a particular cell i.
Assuming finally a simple threshold-linear activation function [22] for the relation between the activating current and the output firing rate, we write where ½ : z indicates taking the sum inside the brackets if positive in value, and zero if negative, and g is a gain factor. The firing rates of the various populations are all assumed to depend on the positionx x of the animal, and the notation is chosen to minimize differences with our previous analyses of other components of the hippocampal system (e.g. [22], [23]).

The storage of a new representation
When the animal is exposed to a new environment, we make the drastic modelling assumption that the new CA3 representation be driven solely by MF inputs, while PP and RC inputs provide interfering information, reflecting the storage of previous representations on those synaptic systems, i.e., noise. Such ''noise'' can in fact act as an undesired signal and bring about the retrieval of a previous, ''wrong'' representation, an interesting process which is not however analysed here. We reabsorb the mean of such noise into the mean of the ''threshold+inhibition'' termT T and similarly for the deviation from the mean. We use the same symbols for the new variables incorporating RC and PP interference, but removing in both cases the ''*'' sign, thus writing where the gain has been set to g~1, without loss of generality, by an appropriate choice of the units in which to measure fc MF g,fJ MF g (pure numbers) and d i ,T (s {1 ). As for the MF inputs, we consider a couple of simplified models that capture the essential finding by [18], of the irregularly arranged multiple fields, as well as the observed low activity level of DG granule cells [24], while retaining the mathematical simplicity that favours an analytical treatment. We thus assume that only a randomly selected fraction p DG of the granule cells are active in a new environment, of size A, and that those units are active in a variable number Q j of locations, with Q j drawn from a distribution with mean q. In model A, which we take as our reference, the distribution is taken to be Poisson (the data reported by Leutgeb et al [18] are fit very well by a Poisson distribution with q~1:7, but their sampling is limited). In model B, which we use as a variant, the distribution is taken to be exponential (this better describes the results of the simulations in [25], though that simple model may well be inappropriate). Therefore, in either model, the

Author Summary
The CA3 region at the core of the hippocampus, a structure crucial to memory formation, presents one striking anatomical feature. Its neurons receive many thousands of weak inputs from other sources, but only a few tens of very strong inputs from the neurons in the directly preceding region, the dentate gyrus. It had been proposed that such sparse connectivity helps the dentate gyrus to drive CA3 activity during the storage of new memories, but why it needs to be so sparse had remained unclear. Recent recordings of neuronal activity in the dentate gyrus (Leutgeb, et al. 2007) show the firing maps of granule cells of rodents engaged in exploration: the few cells active in a given environment, about 3% of the total, present multiple firing fields. Following these findings, we could now construct a network model that addresses the question quantitatively. Both mathematical analysis and computer simulations of the model show that, while the memory system would function also otherwise, connections as sparse as those observed make it function optimally, in terms of the bits of information new memories contain. Much of this information, we show, is encoded however in a difficult format, suggesting that other regions of the hippocampus, until now with no clear role, may contribute to decode it.
ð1Þ firing rate b j (x x) of DG unit j is a combination of Q j gaussian ''bumps'', or fields, of equal effective size (s f ) 2 and equal height b 0 , centered at random pointsx x jk in the new environment The informative inputs driving the firing of a CA3 pyramidal cell, during storage of a new representation, result therefore from a combination of three distributions, in the model. The first, Poisson but close to normal, determines the MF connectivity, that is how it is that each CA3 unit receives only a few tens of connections out of N DG^1 0 6 granule cells (in the rat), whereby c MF C MF =N DG :c MF . The second, Poisson, determines which of the DG units presynaptic to a CA3 unit is active in the new environment, with P(unit j is active)~p DG . The third, either Poisson or exponential (and see model C below), determines how many fields an active DG unit has in the new environment. Note that in the rat C MF^4 6 [26] whereas p DG &0:02{0:05, even when considering presumed newborn neurons [24]. As a result, the total number of active DG units presynaptic to a given CA3 unit, p DG C MF :a, is of order one, a*1{2, so that the second Poisson distribution effectively dominates over the first, and the number of active MF impinging on a CA3 unit can approximately be taken to be itself a Poisson variable with mean a. As a qualification to such an approximation, one has to consider that different CA3 pyramidal cells, among the N CA3^3 |10 5 present in the rat (on each side), occasionally receive inputs from the same active DG granule cells, but rarely, as N DG^1 0 6 , hence the pool of active units p DG N DG is only one order of magnitude smaller than the population of receiving units N CA3 .
In a further simplification, we consider the MF synaptic weights to be uniform in value, J MF ij :J. This assumption, like those of equal height and width of the DG firing fields, is convenient for the analytical treatment but not necessary for the simulations. It will be relaxed later, in the computer simulations addressing the effect of MF synaptic plasticity.
The new representation is therefore taken to be established by an informative signal coming from the dentate gyrus modulated, independently for each CA3 unit, by a noise term d i , reflecting recurrent and perforant path inputs as well as other sources of variability, and which we take to be normally distributed with zero mean and standard deviation d.
The positionx x of the animal determines the firing fbg of DG units, which in turn determine the probability distribution for the firing rate of any given CA3 pyramidal unit is the integral of the gaussian noise up to given signal-to-noise ratio and H(g) is Heaviside's function vanishing for negative values of its argument. The first term, multiplying Dirac's d g i ð Þ, expresses the fact that negative activation values result in zero firing rates, rather than negative rates.
Note that the resulting sparsity, i.e. how many of the CA3 units end up firing significantly at each position, which is a main factor affecting memory storage [21], is determined by the threshold T, once the other parameters have been set. The approach taken here is to assume that the system requires the new representation to be sparse and regulates the threshold accordingly. We therefore set the sparsity parameter a CA3~0 :1, in broad agreement with experimental data [14], and adjust T (as shown, for the mathematical analysis, in the third section of the Methods).
The distribution of fields per DG unit is given in model A by the Poisson form in model B by the exponential form and we also consider, as another variant, model C, where each DG unit has one and only one field P C (Q)~d 1Q :

Assessing spatial information content
In the model, spatial positionx x is represented by CA3 units, whose activity is informed about position by the activity of DG units. The activity of each DG unit is determined independently of others by its place fields where each contributing field is a gaussian bump The Mutual Information Ix x,fg i g ð Þquantifies the efficiency with which CA3 activity codes for position, on average, as where the outer brackets S : T indicate that the average is not just over the noise d, as usual in the estimation of mutual information, but also, in our case, over the quenched, i.e. constant but unknown values of the microscopic quantities c ij , the connectivity matrix, Q j , the number of fields per active unit, andx x jk , their centers. For given values of the quenched variables, the total entropy H 1 and the (average) equivocation H 2 are defined as where A is the area of the given environment; the logs are intended in base 2, to yield information values in bits.
The estimation of the mutual information can be approached analytically directly from these formulas, using the replica trick (see [27]), as shown by [28] and [29], and briefly described in the first section of the Methods. As in those two studies, however, here too we are only able to complete the derivation in the limit of low signal-to-noise, or more precisely of limited variation, across space, of the signal-to-noise around its mean, that is v(r i (x x){vr i (x x)wx x ) 2 wx x ?0. In this case we obtain, to first order in N:N CA3 , an expression that can be shown to be equivalent to where we use the notation s(r)~(1= ffiffiffiffiffi ffi 2p p ) exp{r 2 =2 (cp. [29], Eqs. 17,45).
Being limited to the first order in N, the expression above can be obtained in a straightforward manner by directly expanding the logarithms, in the large noise limit d??, in the simpler formula quantifying the information conveyed by a single CA3 unit This single-unit formula cannot quantify the higher-order contributions in N, which decrease the information conveyed by a population in which some of the units inevitably convey some of the same information. The replica derivation, instead, in principle would allow one to take into proper account such correlated selectivity, which ultimately results in the information conveyed by large CA3 populations not scaling up linearly with N, and saturating instead once enough CA3 units have been sampled, as shown in related models by [28], [29]. In our case however the calculation of e.g. the second order terms in N is further complicated by the fact that different CA3 units receive inputs coming from partially overlapping subsets of DG units. This may cause saturation at a lower level, once all DG units have been effectively sampled. The interested reader can follow the derivation sketched in the Methods.
Having to take, in any case, the large noise limit implies that the resulting formula is not really applicable to neuronally plausible values of the parameters, but only to the uninteresting case in which DG units impart very little information onto CA3 units. Therefore we use only the single-unit formula, and resort to computer simulations to assess the effects of correlated DG inputs. The second and third sections of the Methods indicate how to obtain numerical results by evaluating the expression in Eq. 9.
Computer simulations can be used to estimate the information present in samples of CA3 units of arbitrary size, and at arbitrary levels of noise, but at the price of an indirect decoding procedure. A decoding step is required because the dimensionality of the space spanned by the CA3 activity fg i g is too high. It increases in fact exponentially with the number N of neurons sampled, as M N , where M is the number of possible responses of each neuron. The decoding method we use, described in the fourth section of the Methods, leads to two different types of information estimates, based on either the full or reduced localization matrix. The difference between the two, and between them and the analytical estimate, is illustrated under Results and further discussed at the end of the paper.

Results
The essential mechanism described by the model is very simple, as illustrated in Fig. 1. CA3 units which happen to receive a few DG overlapping fields combine them in a resulting field of their own, that can survive thresholding. The devil is in the quantitative details: what proportion of CA3 cells express place fields, how large are the fields, and how strong are the fields compared with the noise, all factors that determine the information contained in the spatial representation. Note that a given CA3 unit can express multiple fields.
It is convenient to discuss such quantitative details with reference to a standard set of parameters. Our model of reference is a network of DG units with fields represented by Gaussian-like functions of space, with the number of fields per each DG units given by a Poisson distribution with mean value q, and parameters as specified in Table 1.
In general, the stronger the mean DG input, the more it dominates over the noise, and also the higher the threshold has to be set in CA3 to make the pattern of activity as sparse as required, by fixing a CA3~0 :1. To control for the trivial advantage of a higher signal-to-noise, we perform comparisons in which it is kept fixed, by adjusting e.g. the MF synaptic strength J.

Multiple input cells vs. multiple fields per cell
The first parameter we considered is q, the average number of fields for each DG unit, in light of the recent finding that DG units active in a restricted environment are more likely to have multiple fields than CA3 units, and much more often than expected, given their weak probability of being active [18]. We wondered whether receiving multiple fields from the same input units would be advantageous for CA3, and if so whether there is an optimal q value. We therefore estimated the mutual information when q varies and m, the total mean number of DG fields that each CA3 cell ð8Þ ð9Þ receives as input, is kept fixed, by varying C MF correspondigly. As shown in Fig. 2, varying q in this manner makes very little difference in the bits conveyed by each CA3 cell. This figure reports the results of computer simulations, that illustrate also the dependence of the mutual information on N CA3 , the number of cells sampled. The dependence is sub-linear, but rather smooth, with significant fluctuations from sample-to-sample which are largely averaged out in the graph. The different lines correspond to different distributions of the input DG fields among active DG cells projecting to CA3, that is different combinations of values for q and C MF~m =(qp DG ), with m kept constant; these different distributions do not affect much the information in the representation.
The analytical estimate of the information per CA3 unit confirms that there is no dependence on q (Fig. 2, inset). This is not a trivial result, as it would be if only the parameter m entered the analytical expression. Instead, the second section of the Methods shows that the parameters C m of the m-field decomposition depend separately on q and a:p DG C MF , so the fact that the two separate dependencies almost cancel out in a single dependence on their product, m, is remarkable. Moreover, such analytical estimate of the information conveyed by one unit does not match the first datapoints, for N CA3~1 , extracted from the computer simulation; it is not higher, as might have been expected considering that the simulation requires an additional information loosing decoding step, but lower, by over a factor of 2. The finding that the analytical estimate differs from, and is in fact much lower than, the slope parameter extracted from the simulations, after the decoding step, is further discussed below. Despite their incongruity in absolute values, neither the estimate derived from the simulations   nor the analytical estimate have separate dependencies on q and a, as shown in Fig. 2.

More MF connections, but weaker
Motivated by the striking sparsity of MF connections, compared to the thousands of RC and PP synaptic connections impinging on CA3 cells in the rat, we have then tested the effect of changing C MF without changing q. In order to vary the mean number of DG units that project to a single CA3 unit, while keeping constant the total mean input strength, assumed to be an independent biophysically constrained parameter, we varied inversely to C MF the synaptic strength parameter J. As shown in Fig. 3, the information presents a maximum at some intermediate value C MF^2 0{30, which is observed both in simulations and in the analytical estimate, despite the fact that again they differ by more than a factor of two.
Again we find that the analytical estimate differs from, and is in fact much lower than, the slope parameter extracted from the simulations, after the decoding ste. Both measures, however, show that the standard model is not indifferent to how sparse are the MF connections. If they are very sparse, most CA3 units receive no inputs from active DG units, and the competition induced by the sparsity constraint tends to be won, at any point in space, by those few CA3 units that are receiving input from just one active DG unit. The resulting mapping is effectively one-to-one, unit-tounit, and this is not optimal information-wise, because too few CA3 units are active -many of them in fact have multiple fields (Fig. 4, right), reflecting the multiple fields of their ''parent'' units in DG. As C MF increases (with a corresponding decrease in MF synaptic weight), the units that win the competition tend to be those that summate inputs from two or more concurrently active DG units. The mapping ceases to be one-to-one, and this increases the amount of information, up to a point. When C MF is large enough that CA3 units begin to sample more effectively DG activity, those that win the competition tend to be the ''happy few'' that happen to summate several active DG inputs, and this tends to occur at only one place in the environment. As a result, an ever smaller fraction of CA3 units have place fields, and those tend to have just one, often very irregular, as shown in Fig. 4, right. From that point on, the information in the representation decreases monotonically. The optimal MF connectivity is then in the range which maximizes the fraction of CA3 units that have a field in the newly learned environment, at a value, roughly one third, broadly consistent with experimental data (see e.g. [30]).
It is important to emphasize that what we are reporting is a quantitative effect: the underlying mechanism is always the same, the random summation of inputs from active DG units. DG in the model effectively operates as a sort of random number generator, whatever the values of the various parameters. How informative are the CA3 representations established by that random number generator, however, depends on the values of the parameters.

Other DG field distribution models
We repeated the simulations using other models for the DG fields distribution, the exponential (model B) and the single field one (model C), and the results are similar to those obtained for model A: the information has a maximum when varying C MF on its own, and is instead roughly constant if the parameter m is held constant (by varying q inversely to C MF ). Fig. 5 reports the comparison, as C MF varies, between models A and B, with q~1:7, and model C, where q:1, so that in this latter case the inputs are 1/1.7 times weaker (we did not compensate by multiplying J by 1.7). Information measures are obtained by decoding several samples of 10 units, averaging and dividing by 10, and not by extracting the fit parameters. As one can see, the lower mean input for model C leads to lower information values, but the trend with C MF is the same in all three models. This further indicates that the multiplicity of fields in DG units, as well as its exact distribution, is of no major consequence, if comparisons are made keeping constant the mean number of fields in the input to a CA3 unit.

Sparsity of DG activity
We study also how the level of DG activity affects the information flow. We choose diffferent values for the probability p DG that a single DG unit fires in the given environment, and again we adjust the synaptic weight J to keep the mean DG input per CA3 cell constant across the comparisons.
Results are simular to those obtained varying the sparsity of the MF connections (Fig. 6). Indeed, the analytical estimate in the two conditions would be exactly the same, within the approximation with which we compute it, because the two parameters p DG and C MF enter the calculation in equivalent form, as a product. The actual difference between the two parameters stems from the fact that increasing C MF , CA3 units end up sampling more and more the same limited population of active DG units, while increasing p DG this population increases in size. This difference can only be appreciated from the simulations, which however show that the main effect remains the same: an information maximum for rather sparse DG activity (and sparse MF connections), The subtle difference between varying the two parameters can be seen better in the saturation information value: with reference to the standard case, in the center of the graph in the inset, to the right increasing p DG leads to more information than increasing C MF , while to the left the opposite is the case, as expected.

Full and simplified decoding procedures
As noted above, we find that the analytical estimate of the information per unit is always considerably lower than the slope parameter of the fit to the measures extracted from the simulations, contrary to expectations, since the latter require an additional decoding step, which implies some loss of information. We also find, however, that the measures of mutual information that we extract from the simulations are strongly dependent on the method used, in the decoding step, to construct the ''localization matrix'', i.e. the matrix which compiles the frequency with which the virtual rat was decoded as being in positionx x 0 when it was actually in positionx x. All measures reported so far, from simulations, are obtained constructing what we call the full localization matrix Q(x x,x x 0 ) which, if the square environment is discretized into 20|20 spatial bins, is a large 400|400 matrix, which requires of order 160,000 decoding events to be effectively sampled. We run simulations with trajectories of 400,000 steps, and additionally corrected the information measures to avoid the limited sampling bias [31].
An alternative, that allows extracting unbiased measures from much shorter simulations, is to construct a simplified matrix Q Q(x x{x x 0 ), which averages over decoding events with the same vector displacement between actual and decoded positions. Q Q(x x{x x 0 ) is easily constructed on the torus we used in all simulations, and being a much smaller 20|20 matrix it is effectively sampled in just a few thousand steps.
The two decoding procedures, given that the simplified matrix is the shifted average of the rows of the full matrix, might be expected to yield similar measures, but they do not, as shown in Fig. 7. The simplified matrix, by assuming translation invariance  Fig. 7 shows that such episodic information always prevails, whatever the connectivity, i.e. in all three parameter regimes illustrated in Fig. 4. The lower right panel in Fig. 7 compares, instead, the entropies of the decoded positions with the two matrices, conditioned on the actual position -that is, the equivocation values. Unlike the mutual information, such equivocation is much higher for the simplified matrix; for this matrix, it is simply a measure of how widely displaced are decoded positions, with respect to the actual positions, represented at the center of the square; and for small samples of units, which are not very informative, the ''displacement'' entropy approaches that of a flat distribution of decoded positions, i.e. log 2 (400)^8:64 bits. For larger samples, which enable better localization, the simplified localization matrix begins to be clustered in a Gaussian bell around zero displacement, so that the equivocation gradually decreases (the list of displacements, with their frequencies, is computed for each sample, and it is the equivocation, not the list itself, which is averaged across samples).
In contrast, the entropy of each row of the full localization matrix, i.e. the entropy of decoded positions conditioned on any actual position, is lower, and also decreasing more steeply with sample size; it differs from the full entropy, in fact, by the mutual information between decoded and actual positions, which increases with sample size. The two equivocation measures therefore both add up to the two mutual information measures to yield the same full entropy of about 8.64 bits (a bit less in the case of the full matrix, where the sampling is more limited), and thus serve as controls that the difference in mutual information is not due, for example, to inaccuracy. As a third crucial control, we calculated also the average conditional entropy of the full localization matrix, when the matrix is averaged across samples of a given size: the resulting entropy is virtually identical to the displacement entropy (which implies instead an average of the full matrix across rows, i.e. across actual positions). This indicates that different samples of units express distinct episodic content at each location, such that averaging across samples is equivalent to averaging across locations. Apparently, also the analytical estimate is unable to capture the spatial information implicit in such ''episodic'' errors, as its values are well below those obtained with the full matrix, and somewhat above those obtained with the simplified matrix (consistent with some loss with decoding). One may wonder how can the information from the full localization matrix (which also requires a decoding step) be higher than the decoding-free analytical estimate, without violating the basic information processing theorem. The solution to the riddle, as we understand it, is subtle: when decoding, one takes essentially a maximum likelihood estimate, assigning a unique decoded position per trial, or time step. This leads to a ''quantized'' localization matrix, which in general tends to have substantially higher information content than the ''smoothed'' matrix based on probabilities [32]. In the analytical derivation there is no concept of trial, time step or maximal likelihood, and the matrix expresses smoothly varying probabilities. The more technical implications are discussed further at the end of the Methods. These differences do not alter the other results of our study, since they affect the height of the curves, not their shape, however they have important implications. The simplified matrix has the advantage of requiring much less data, i.e. less simulation time, but also less real data if applied to neurophysiological recordings, than the full matrix, and in most situations it might be the only feasible measure of spatial information (the analytical estimate is not available of course for real data). So in most cases it is only practical to measure spatial information with methods that, our model suggests, miss out much of the information present in neuronal activity, what we may refer to as ''dark information'', not easily revealed. One might conjecture that the prevalence of dark information is linked to the random nature of the spatial code established by DG inputs. It might be that additional stages of hippocampal processing, either with the refinement of recurrent CA3 connections or in CA1, are instrumental in making dark information more transparent.

Effect of learning on the mossy fibers
While the results reported this far assume that MF weights are fixed, J~1, we have also conducted a preliminary analysis of how the amount of spatial information in CA3 might change as a consequence of plasticity on the mossy fibers. In an extension of the standard model, we allow the weights of the connections between DG and CA3 to change with a model ''Hebbian'' rule. This is not an attempt to capture the nature of MF plasticity, which is not NMDA-dependent and might not be associative [33], but only the adoption of a simple plasticity model that we use in other simulations. At each time step (that corresponds to a different place in space) weights are taken to change as follows: where c MF is a plasticity factor that regulates the amount of learning. Modifying in this way the MF weights has the general effect of increasing information values, so that they approach saturation levels for lower number of CA3 cells; in particular this is true for the information extracted from both full and simplified matrices. In Fig. 8, the effect of such ''learning'' is shown for different values of the parameter c MF , as a function of connectivity. We see that allowing for this type of plasticity on mossy fibers leads to shift the maximum of information as a function of the connectivity level. The structuring of the weights effectively results in the selection of favorite input connections, for each CA3 unit, among a pool of availables ones; the remaining strong connections are a subset of those ''anatomically'' present originally. It is logical, then, that starting with a larger pool of connnections, among which to pick the ''right'' ones, leads to more information than starting with few connections, which further decrease in effective number with plasticity. We expect better models of the details of MF plasticity to preserve this main effect. A further effect of learning, along with the disappearance of some CA3 fields and the strengthening of others, is the refinement of their shape, as illustrated in Fig. 9. It is likely that also this effect will be observed even when using more biologically accurate models of MF plasticity.

Retrieval abilities
Finally, all simulations reported so far involved a full complement of DG inputs at each time step in the simulation. We have also tested the ability of the MF network to retrieve a spatial representation when fed with a degraded input signal, with and without MF plasticity. The input is degraded, in our simulation, simply by turning on only a given fraction, randomly selected, of the DG units that would normally be active in the environment. The information extracted after decoding by a sample of units (in Fig. 10, 10 units) is then contrasted with the size of the cue itself. In the absence of MF plasticity, there is obviously no real retrieval process to talk about, and the DG-CA3 network simply relays partial information. When Hebbian plasticity is turned on, the expectation from similar network models (see e.g. [34], Fig. 9) is that there would be some pattern completion, i.e. some tendency for the network to express nearly complete output information when the input is partial, resulting in a more sigmoidal input-output curve (the exact shape of the curve depends of course also on the particular measure used).
It is apparent from Fig. 10 that while, in the absence of plasticity, both parameters characterizing the information that can be extracted from CA3 grow roughly linearly with the size of the cue, with plasticity the growth is supralinear. This amounts to the statement that the beneficial effects of plasticity require a full cue to be felt -the conceptual opposite to pattern completion, the process of integrating a partial cue using information stored on modified synaptic weights. This result suggests that the sparse MF connectivity is sub-optimal for the associative storage that leads to pattern completion, a role that current perspectives ascribe instead to perforant path and recurrent connections to CA3. The role of the mossy fibers, even if plastic, may be limited to the establishment of new spatial representations.

Discussion
Ours is a minimal model, which by design overlooks several of the elements likely to play an important role in the functions of the dentate gyrus -perhaps foremost, neurogenesis [35]. Nevertheless, by virtue of its simplicity, the model helps clarify a number of quantitative issues that are important in refining a theoretical perspective of how the dentate gyrus may work. First, the model indicates that the recently discovered multiplicity of place fields by active dentate granule cells [18] might be just a ''fact of life'', with no major computational implications for dentate information processing. Still, requiring that active granule cells express multiple fields seems to lead, in another simple network model (of how dentate activity may result from entorhinal cortex input [25]), to the necessity of inputs   coming from lateral EC, as well as from medial EC. The lateral EC inputs need not carry any spatial information but help to select the DG cells active in one environment. Thus the multiplicity of DG fields refines the computational constraints on the operation of hippocampal circuits. Second, the model shows that, assuming a fixed total MF input strength on CA3 units, it is beneficial in information terms for the MF connectivity to be very sparse; but not vanishingly sparse. The optimal number of anatomical MF connections on CA3 units, designated as C MF in the model, depends somewhat on the various parameters (the noise in the system, how sparse is the activity in DG and CA3, etc.) and it may increase slightly when taking MF plasticity into account, but it appears within the range of the number, 46, reported for the rat by [26]. It will be interesting to see whether future measures of MF connectivity in other species correspond to those ''predicted'' by our model once the appropriate values of the other parameters are also experimentally measured and inserted into the model. A similar set of consideration applies to the fraction of granule cells active in a given environment, p DG , which in the model plays a similar, though not completely identical, role to C MF in determining information content. Third, the model confirms that the sparse MF connections, even when endowed with associative plasticity, are not appropriate as devices to store associations between input and output patterns of activity -they are just too sparse. This reinforces the earlier theoretical view [2], [4], which was not based however on an analysis of spatial representations, that the role of the dentate gyrus is in establishing new CA3 representations and not in associating them to representations expressed elsewhere in the system. Availing itself of more precise experimental paramaters, and based on the spatial analysis, the current model can refine the earlier theoretical view and correct, for example, the notion that ''detonator'' synapses, firing CA3 cells on a one-to-one basis, would be optimal for the mossy fiber system. The optimal situation turns out to be the one in which CA3 units are fired by the combination of a couple of DG input units, although this is only a statistical statement. Whatever the exact distribution of the number of coincident inputs to CA3, DG can be seen as a sort of random pattern generator, that sets up a CA3 pattern of activity without any structure that can be related to its anatomical lay-out [36], or to the identity of the entorhinal cortex units that have activated the dentate gyrus. As with random number generators in digital computers, once the product has been spit out, the exact process that led to it can be forgotten. This is consistent with experimental evidence that inactivating MF transmission or lesioning the DG does not lead to hippocampal memory impairments once the information has already been stored, but leads to impairments in the storage of new information [6], [7]. The inability of MF connection to subserve pattern completion is also consistent with suggestive evidence from imaging studies with human subjects [37].
Fourth, and more novel, our findings imply that a substantial fraction of the information content of a spatial CA3 representation, over half when sampling limited subsets of CA3 units, can neither be extracted through the simplified method which assumes translation invariance, nor assessed through the analytical method (which anyway requires an underlying model of neuronal firing, and is hence only indirectly applicable to real neuronal data). This large fraction of the information content is only extracted through the time-consuming construction of the full localization matrix. To avoid the limited sampling bias [38] this would require, in our hands, the equivalent of a ten hour session of recording from a running rat (!), with a square box sampled in 20|20 spatial bins.
We have hence labeled this large fraction as dark information, which requires a special effort to reveal. Although we know little of how the real system decodes its own activity, e.g. in downstream neuronal populations, we may hypothesize that the difficulty at extracting dark information affects the real system as well, and that successive stages of hippocampal processing have evolved to address this issue. If so, qualitatively this could be characterized as the representation established in CA3 being episodic, i.e. based on an effectively random process that is functionally forgotten once completed, and later processing, e.g. in CA1, may be thought to gradually endow the representations with their appropriate continuous spatial character. Another network model, intended to elucidate how CA1 could operate in this respect, is the object of our on-going analysis.
The model analysed here does not include neurogenesis, a most striking dentate phenomenon, and thus it cannot comment on several intriguing models that have been put forward about the role of neurogenesis in the adult mammalian hippocampus [39], [40], [41]. Nevertheless, presenting a simple and readily expandable model of dentate operation can facilitate the development of further models that address neurogenesis, and help interpret puzzling experimental observations. For example, the idea that once matured newborn cells may temporally ''label'' memories of episodes occurring over a few weeks [42], [43], [44], [45] has been weakened by the observation that apparently even young adult-born cells, which are not that many [45], [46], [47], are very sparsely active, perhaps only a factor of two or so more active than older granule cells [24]. Maybe such skepticism should be reconsidered, and the issue reanalysed using a quantitative model like ours. One could then investigate the notion that the new cells link together, rather than separating, patterns of activity with common elements (such as the temporal label). To do that clearly requires extending the model to include a description not only of neurogenesis, but also of plasticity within DG itself [48] and of its role in the establishment of successive representations one after the other.

Replica calculation
Estimation of the equivocation. Calculating the equivocation from its definition in Eq.7 is straightforward, thanks to the simplifying assumption of independent noise in CA3 units. We get where although the spatial integral remains to be carried out.
Estimation of the entropy. For the entropy, Eq.6, the calculation is more complicated. Starting from ! we remove the logarithm using the replica trick (see [27]) which can be rewritten (Nadal and Parga [49] have shown how to use the replica trick in the n?1 limit, a suggestion used in [50] to analyse information transfer in the CA3-CA1 system) using the spatial averages, defined for an arbitrary real-valued number n of replicas where we have defined a quantity dependent on both the number n of replicas and on the position in space, later to be integrated over, of each replica b: We need therefore to carry out integrals over the firing rate of each CA3 unit, g i , in order to estimate h i fx x b g, n À Á , while keeping in mind that in the end we want to take n?1. Carrying out the integrals yields a below-threshold and an above-threshold term where we have defined the quantities and g o~( 1=n) 2 , hence in the product over cells, that defines the entropy H 1 fg i g ð Þ, the only terms that survive in the limit n?1 would just be the summed single-unit contributions obtained from the first derivatives with respect to n. This is not true, however, as taking the replica limit produces the counterintuitive effect that replicatensor products of terms, which individually disappear for n?1, only vanish to first order in n{1, as shown by [29]. The replica method is therefore able, in principle, to quantify the effect of correlations among units, expressed in entropy terms stemming from the product of h i across units.
Briefly, one has where the first two rows come from the term below threshold, and the last two from the one above threshold. Then, following [29], and where we have considered that in the limit n?1 we have g o =d:r o appear in all terms of finite weight. The products between the matrices G a,b attached to each CA3 unit generate the higher order terms in N. Calculating them in our case, in which different CA3 units can receive partially overlapping inputs from DG units, is extremely complex (see [51], where information transmission across a network is also considered), and we do not pursue here the analysis of such higher order terms. One can retrieve the result of the TG model in Ref. [29] by taking the further limit r o ?0, which implies W(r o )?1=2 ð12Þ and s 2 (r o )?1=(2p). A further subtlety is that, in taking the n?1 limit, there is a single replica, sayx x, which is counted once in the limit, but also several different replicas, denotedx x 0 ,x x 00 , . . ., whose weights vanish, but which remain to determine e.g. the terms proportional to (n{1) emerging from the derivatives. Thus, in the very last term of Eq. 17, one has to derive r o with respect to n to produce the C term of Eq. 19, which is absent in [29] because it vanishes with r o . In the off-diagonal terms of the G matrix there are 2(n{1) entries dependent on replicasx x andx x 0 , and (n{1)(n{2) entries dependent on replicasx x 0 andx x 00 .
Focusing now solely on terms of order N, note that the term S is effectively a spatial signal. In the n?1 limit it can be rewritten, usingx x for the single surviving replica, as This allows us to derive, to order N, our result for the spatial information content, Eq. 8. Note that when the threshold of each unit tends to {?, and therefore its mean activation r i ??, our units behave as threshold-less linear units with gaussian noise, and the information they convey tends to which is simply expressed in terms of a spatial signal-to-noise ratio, and coincides with the results in Refs. [28], [29].

m-Field decomposition
Eqs. 8 and 9 simply sum equivalent average contributions from each CA3 unit. Each such contribution can then be calculated as a series in m, the number of DG fields feeding into the CA3 unit. One can in fact write, for example, where in each term there are c active DG units, indexed by j, presynaptic to CA3 unit i, and each has Q j fields (including the possibility that Q j~0 ), indexed by k. A similar expansion can be written for the other terms. One then realizes that the spatial component reduces to integrals that depend solely on the total number of fields m~P n j Q j , no matter how many DG active units they come from, and the expansion can be rearranged into an expansion in m SIx x,fg i g ð ÞT~N 2 ln 2 where one of the components in each term is, for example, with r(x x 0 ,fx =d the mean signal-tonoise at positionx x produced by m fields, from no matter how many DG units. The numerical coefficient C m , instead, stems from the combination of the distribution for the number of fields for each presynaptic DG unit active in the environment, which differs between models A, B and C, and the Poisson distribution for the number of such units where K 0 :1 and the other K m (l) are the polynomials . .  For model B, and combining the Poisson with the exponential series one finds C m~e where againK K 0 :1, while the otherK K m (l l) are the distinct polynomialsK given by the further modified Khayyam-Tartaglia recursion relationC C(l, m)~C C(l{1, m{1)=lzC C(l, m{1) and wherel l~a=(1zq).
For model C, there is no parameter q (i.e., q:1), and one simply finds Note that in the limit q?0, when the mean input per CA3 unit m~aq remains finite, for both models A and B one finds which is equivalent to Eq. 25, in line with the fact that both models A and B reduce, in the q?0 limit, to single-field distributions, but even units with single fields become vanishingly rare, so formally one has to scale up the mean number of active presynaptic units, a, to keep m:aq finite and establish the correct comparison to model C.

Sparsity and threshold
The analytical relation between the threshold T of CA3 units and the sparsity aa CA3 of the layer is obtained starting from the formula defining the sparsity a CA3 (see below) which can be rewritten Since in the analytical calculation we have T as parameter, this equation can be taken as a relation aa CA3 (T) which has to be inverted to allow a comparison with the simulations, which are run controlling the sparsity level at a predefined level (in our case aa CA3~0 :1) and adjusting the threshold parameter accordingly. The inversion requires using the m-field decomposition and numerical integration. A graphical example of the numerical relation is given in Fig. 11.

Simulations
The mathematical model described above was simulated with a network of 15000 DG cells and 500 CA3 cells. A virtual rat explores a continuous two dimensional space, intended to represent a 1sqm square environment but realized as a torus, with periodic boundary conditions. For the numerical estimation of mutal information, the environment is discretized in a grid of 20|20 locations, whereas trajectories are in continuous space, but in discretized time steps. In each time step (intended to correspond to roughly 62:5ms, half a theta cycle) the virtual rat moves half a grid unit (2:5cm) in a direction similar to the direction of the previous time step, with a small amount of noise. To allow construction of a full localization matrix with good statistics, simulations are run for typically 400,000 time steps (while for the simplified translationally invariant matrix 5,000 steps would be sufficient). The space has periodic boundary conditions, as in a torus, to avoid border effects; the longest possible distance between any two locations is hence equal to 14.1 grid units, or 70cm.
DG place fields. After assigning a number of firing fields for each DG units, according to the distributions of models A, B and C, we assign to each field a randomly chosen center. The shape of the field is then given by a Gaussian bell with that center. The tails of the Gausssian function are truncated to zero when the distance from the center is larger than a fixed radius r~ffi ffiffiffiffi ffi fA p r , with f~0:1 the ratio between the area of the field and the environment area A. In the standard model, only about 3 percent of the DG units on average are active in a given environment, in agreement with experimental findings [24]; i.e. the DG firing probability is p DG~0 :033. The firing of DG units is not affected by noise, nor by any further threshold. Peak firing is conventionally set, in the center of the field, at the value r 2 2p~2 :02, but DG units can fire at higher levels if they are assigned two or more overlapping fields. CA3 activation. CA3 units fire according to Eq. 2: the firing of a CA3 unit is a linear function of the total incoming DG input, distorted by a noise term. This term is taken from a gaussian distribution centered on zero, with variance d~1, and it changes for each unit and each time step. A threshold is imposed in the simulations to model the action of inhibition, hypothesizing that it serves to adjust the sparsity a CA3 of CA3 activity to its required value. The sparsity is defined as g ix x ð ÞT 2 Sg 2 ix x ð ÞT and it is fixed to a CA3~0 :1. This implies that the activity of the CA3 cells population is under tight inhibitory control. The decoding procedure and information extraction. At each time step, the firing vector of a set of CA3 units is compared to all the average vectors recorded at each position in the 20|20 grid, for the same sample, in a test trial (these are called template vectors). The comparison is made calculating the Euclidean distance between the current vector and each template, and the position of the closest template is taken to be the decoded position at that time step, for that sample. This procedure has been termed maximum likelihood Euclidean distance decoding [32]. The frequency of each pair of decoded and real positions are compiled in a so-called ''confusion matrix'', or localization matrix, that reflects the ensemble of conditional probabilities P g i f gDx x ð Þ f gfor that set of units. Should decoding ''work'' in a perfect manner, in the sense of always detecting the correct position in space of the virtual rat, the confusion matrix would be the identity matrix. From the confusion matrix obtained at the end of the simulation, the amount of information is extracted, and plotted versus the number of CA3 units present in the set. We averaged extensively over CA3 samples, as there are large fluctuations from sample to sample, i.e. for each given number of CA3 units we randomly picked several different groups of CA3 units and then averaged the mutual information values obtained. In all the results reported we averaged also over 3-4 simulation run with a different random number generator, i.e. over different trajectories. The same procedure leading to the information curve was repeated for different values of the parameters. In all the information measures we reported, we also corrected for the limited sampling bias, as discussed by [31]. In our case of spatial information, the bias is essentially determined by the spatial binning we used (20|20) and by the decoding method [52].
One should note the maximum likelihood decoding procedure to better understand the discrepancy between the information estimated from simulations (with the procedure based on the full matrix) and that calculated analytically. The analytical calculation distinguishes in a clear-cut manner so called annealed variables, which are interpreted as ''fast'' noise and are averaged in computing the relation between position and neuronal activity, and so called quenched variables, which are interpreted as frozen disorder and are averaged over only later, in computing average the entropy, free-energy or mutual information [27]. In using maximum likelihood decoding, instead, the localization matrix that relates actual and decoding position effectively averages only trial-to-trial variability, i.e. the noise that occurs on intermediate time scales. The variability on genuinely fast time scales is suppressed, in fact, by the maximum likelihood operation, which acts as a sort of temporal low pass filter with a cut-off time equal to one time step. This suppression of part of the annealed noise leads to larger information values extracted from the simulations, and hence to the notion of ''dark'' information. In the real system, the spiking nature of neuronal activity may induce a similar cut-off, although its quantitative relation to the one-time-step cut-off in the simulations (here intended to be half a theta cycle) remains to be firmly established.
Fitting. We fit the information curves obtained in simulations to exponentially saturating curves as a function of N in order to get the values of the two most relevant parameter that describe their shape: the initial slope I 1 (i.e. the average information conveyed by the activity of individual units) and the total amount of information I ? (i.e. the asymptotic saturation value). The function we used for the fit is the following In most cases the fit was in excellent agreement with individual data points, as expected on the basis of previous analyses [28].