Common population codes produce extremely nonlinear neural manifolds

Significance Information in the brain is collectively processed by very large populations of neurons. Finding shapes in neural population activity data has emerged as a powerful way to study how information is encoded and transformed by these large populations. Most common data analysis tools look for linear shapes in data, meaning lines, planes, and higher-dimensional flat analogues. We study two broad families of information-encoding strategies that are frequently observed in the brain. We show that the shapes corresponding to these information-encoding strategies are exceedingly nonlinear and thus would be completely missed by common data analysis methods.


S4 Further numerical experiments
S1 Linear dimension of mean subtracted data is lower bounded by the linear dimension of non-mean-subtracted data minus We consider a population of N neurons with activities at time t given by y(t) = [y 1 (t), y 2 (t), • • • , y N (t)].The mean of the population activity is µ = E t [y(t)], where the expectation value is taken over time.
The non-mean subtracted covariance matrix is given by C = E t yy T .And the mean-subtracted covariance matrix is given by T = E t (y − µ)(y − µ) T .Note that C = T + µµ T .The matrices C, T and µµ T are Hermitian and positive semi-definite.Moreover, µµ T is rank 1, with 1 non-zero eigenvalue ||µ|| 2 and remaining eigenvalues 0. Let the eigenvalues of these matrices be denoted as follows The Weyl inequality for Hermitian matrices yields With a = p, b = N, r = p − 1, s = 2 we get ρ p ≤ λ p ≤ ρ p−1 for p ≥ 2. (3) And with a = 1, b = N, r = 1, s = 1 we get Thus we have the eigenvalue bounds Now assume that the (1 − ) linear dimension of the non-mean-subtracted system is L and that L is at least 2. Thus Define ∆ = L−2 k=1 (λ k − ρ k ) + λ L−1 .Note that 0 ≤ ∆ ≤ λ 1 by the above inequalities. ( Thus Consequently, if the (1 − )-linear dimension of the non-mean-subtracted matrix C is L, the (1 − )-linear dimension of the mean-subtracted matrix T is at least L − 1 (also see numerical confirmation in Fig. S2).In general, depending on ||µ|| 2 the linear dimension of T can be much higher than that of C. Thus, lower bounds on the linear dimension of the non-mean-subtracted case transfer simply to lower bounds on the linear dimension of the mean-subtracted case.

S2 Translation-symmetric tuning curves
In this section, we derive the form of the covariance matrix for translation-symmetric tuning curves in Section S2.1 and show that the eigenvalues are given by the Fourier transform of the covariance profile in Section S2.2.We calculate the covariance matrix and its eigenvalues explicitly for translation-symmetric Gaussian tuning curves in Sections S2.3, S2.4,S2.5.We show that the linear dimension grows exponentially with D for Gaussian tuning curves in Section S2.5.Finally, we use an uncertainty principle relating functions and their Fourier transforms to show that the linear dimension for truly localized tuning curves grows supra-exponentially in Section S2.6.

S2.1 The covariance matrix of translation-symmetric data is translation-symmetric
We assume that the neural response depends on a D-dimensional latent variable, and thus the tuning centers, α n and the stimulus variables The ns-th element of the data matrix is Here f is a translation-symmetric tuning curve so that f for some function g.We assume periodic boundary condition along each dimension with period 1 and thus x, α ∈ [0, 1] D .The latent variable is assumed to be uniformly distributed so that values are drawn uniformly from [0, 1] D .The neuron-neuron covariance matrix is given by where N x is the number of latent variable values.In the limit of large N x this sample mean will converge to , where the expectation is taken over the latent variable distribution.Thus the covariance matrix can also be written as where p(x) is the distribution over the latent variable, which we assume to be uniform, and the integral S dx = S dx 1 • • • dx D , is over all points x in the region S = [0, 1] D .Since the tuning curves are translation-symmetric we have In going from the first to the second line, we have used the translation symmetry of the tuning curves (f (x, α) = f (x − γ, α − γ)); in going from the second line to the third line we have made a change of variable, x = x − α n and defined δ = α m − α n ; and in going from the third line to fourth line we have used the periodic boundary conditions to shift the integral over S − α m to S. c is a periodic function of the difference in tuning curve centers with period 1 along each dimension.Thus the covariance between two neurons just depends on the difference between their tuning curve centers.
In the one-dimensional case (D = 1), if we take the tuning curve centers to be equally spaced lattice points (i.e., α n = n/N for n = 0, • • • N − 1), then the covariance matrix is circulant meaning that each row is a shifted version of the row above.
Note that this holds for m, n = 1, • • • N , since the periodic boundary conditions allow us to define Finally, note that the results approximately hold if the boundary conditions are hard rather than periodic, provided that the tuning curves are not too wide.In this case, the deviations from perfect translation-invariance caused by the boundary conditions will be mild and restricted to a small subset of neurons (see Section S4).

S2.2 Eigenvalues of translation-symmetric covariance matrices
It is well-known that matrices with translation-symmetric structure have eigenvalues that are given by the Fourier transform of the function that generates the matrix (i.e., c) [1,2].We briefly review those results here.
First, consider tuning curve centers that tile the latent space, forming the points of a lattice with N d tuning curve centers along the d-th dimension.That is, tuning curve centers along the dth dimension are spaced at intervals 1/N d , and take values in {1/N d , 2/N d , • • • , 1}.There are N = d N d neurons in total, spanning all possible combinations of one choice of center for each dimension, and thus covering the total space of [0, 1] D .The tuning curve parameter vector for the nth neuron is α n , which has dth component , corresponding to the choice of center for that neuron along dimension d.
Recall that the covariance matrix has entries C mn = c(α m − α n ), where c has period 1 along each dimension.To find the eigenvalues of C (which we will index by p), find consider the set of N vectors k p given by where each κ d p ∈ {1, • • • , N d } (and as with the tuning curve centers, the k p vectors range across all possible combinations of components so that p ranges from 1 to N = d N d ).Corresponding to each k p , define the vector v p = (e ikp.Here we have defined δ n = α m − α n (note that δ n also depends on m).As a consequence of the uniform spacing of tuning curve centers and periodic boundary conditions, as n changes δ n ranges over all possible tuning parameters.Thus, the sum over n in parentheses is independent of both m and n and we can define λ p = N n=1 c(δ n )e −ikp.δn .Consequently, v p is an eigenvector of C with eigenvalue λ p .Note that λ p is the term with frequency k p of the discrete Fourier transform of c.
In particular, for the one-dimensional case, the tuning curve parameter for the nth neuron, α n , is simply a scalar equal to n/N and the frequency parameter k p is also a scalar equal to 2πp (where both n and p range from 1 to N ).v p = (e 2πip/N , • • • , e 2πip ) is an eigenvector with eigenvalue λ p = N n=1 c(n/N )e −2πip n/N .Alternatively, if tuning curve centers are uniformly distributed through the latent space, either evenly spaced or randomly chosen, then in the large N limit we can consider the translation invariant kernel c(α m − α n ), which is a continuous function of α m , α n ∈ R D .This is the continuous generalisation of the covariance matrix for translation-symmetric tuning curves derived above.The product of the matrix C with a vector can then be approximated by the convolution of the kernel with a function f 1 (α n ), where dα n means an integral over all D dimensional vectors α n .f (α n ) is an eigenfunction if where k is an arbitrary D dimensional vector.
As before in going from line 2 to line 3, we have defined a new variable, α m − α n = δ and used the periodic boundary conditions to replace the integral over α n with an integral over δ.The expression in parenthesis does not depend on α m or α n and is the Fourier transform of c(δ).So e ik.αn is an eigenfunction of c(α m − α n ) with eigenvalue dδ c(δ) e −ik.δ .Finally, while these results are for uniformly distributed data with periodic boundary conditions, note that covariance matrices are symmetric and thus normal.Consequently, the effect of perturbations on the eigenvalue spectrum is as mild as possible, and the results should hold approximately for matrices that only approximately satisfy these conditions (see Section S4).

S2.3 Gaussian tuning curves show a Gaussian covariance profile
Consider the case of Gaussian tuning, where the tuning curve of the n-th neuron is given by Here Σ is the covariance matrix of the tuning curve.
Note that Gaussians are defined on an infinite rather than circular interval.Thus, assuming that tuning curves are well-described by Gaussians requires that they are not too wide, so that the firing rate of a neuron centered at the middle of the interval decays to 0 by the ends of the interval.This is a mild requirement, equivalent to requiring that no neuron responds significantly to the entire range of latent variable values.However, the results presented in the previous section relating covariance profile to eigenvalue spectrum are general, as are the corresponding uncertainty principle-derived relationships between covariance profile and linear dimension, and do not require any particular form of the tuning curves.Thus the theory can be extended to wider tuning curves using a different functional form (e.g., von Mises functions) at the expense of simple closed form expressions for linear dimension (note that separately, in Section S2.5, we show that for Gaussian tuning curves exponential growth of linear dimension does not apply if tuning curves are above a certain threshold width).
To generate the neuron-neuron covariance matrix, we wish to calculate the covariance profile c(δ) = S dx f (x, 0)f (x, δ), which is the covariance between pairs of neurons whose tuning curve centers are separated by δ.Note that c is periodic, with period 1 in each direction.For notational convenience we shift the range of δ so that each component lies within [−1/2, 1/2] rather than [0, 1].Thus, Here the constant det(πΣ).Thus, neurons with Gaussian tuning curves have a Gaussian covariance profile with width twice that of the tuning curve.

S2.4 Variance explained and linear dimension for one-dimensional Gaussian tuning curves
In the one-dimensional case, the covariance profile is (20) If tuning curve centers are evenly spaced, this covariance profile is sampled at δ n = n/N , where n takes values in [0, . . ., N − 1].
As described in Section S2.2, the eigenvalues of the covariance matrix are given by the Fourier transform of this profile.For convenience we index the eigenvalues by p ranging from (−N + 1)/2 to N/2 .Ignoring the overall normalization constant K 2 , for each such p we have an eigenvalue where K 3 is a constant, and in the last line we have completed the square and noted that the sum over n is a constant.Thus the eigenvalues have a Gaussian profile with width 2

√ 2π
σ .Also note that the eigenvalues decrease monotonically with the magnitude of p and that, except for λ 0 , they occur in pairs with λ p = λ −p .Since the eigenvalues occur in pairs, the (1 − )-linear dimension is the smallest For large N , approximating both sides of this equation as an integral and canceling the common prefactor K 3 yields Thus, L 1− grows as 1 σ , with a proportionality constant that depends on the fraction of variance explained.In particular, for 95% variance explained, we have L 0.95 = 1.96 √ 2 πσ .

S2.5 Variance explained and exponential scaling of linear dimension for multidimensional Gaussians
We next consider a population neurons with Gaussian tuning to an underlying D-dimensional latent variable.We assume that the tuning curve centers lie on a D dimensional grid with N D equally spaced centers along each dimension.Therefore there are N = N D D neurons.For simplicity we also assume that the width of the tuning is the same along each direction so that the covariance matrix of the tuning curve Σ is diagonal with elements σ 2 along the diagonal.Thus, as shown in Sec S2.2, the covariance between two neurons whose tuning curve centers are separated by , where K 2 is an overall scaling constant.c(δ) is extended periodically outside this range.
As for the one-dimensional case and as discussed in Sec.S2.2, the eigenvalues of the neuronneuron covariance matrix are given by the Fourier transform of c(δ).We can index the eigenvalues by a D-dimensional vector p with dth entry The magnitude of an eigenvalue is thus a function only of ||p||, the Euclidean distance from the origin in p-space.As there are multiple lattice points with the same distance from the origin, there will be multiple eigenvalues with the same magnitude.Moreover, the number of eigenvalues at a given distance from the origin increases as the distance increases.Consequently, as the distance from the origin increases, there will be more eigenvalues but with smaller magnitude.
Overloading notation to define the D-dimensional continuous Gaussian function λ : R D → R as λ(||p||) = K 3 exp(−4π 2 σ 2 ||p|| 2 ), observe that the eigenvalues are given by λ(||p||) sampled at the integer lattice points of a D-dimensional cube with side length N d .
To estimate the (1 − ) linear dimension, note that summing up the L 1− largest eigenvalues is equivalent to summing up all eigenvalues for which ||p|| ≤ R (i.e., eigenvalues corresponding to all lattice points with a ball of radius R), up to some edge effects resulting from the discreteness of the lattice points that will vanish when we take the continuous limit below.Thus we first compute the smallest radius R such that Approximating the sum by the integral of λ(p) and setting Converting to radial coordinates with r 2 = D d=1 p 2 d and then making a second change of coordinates y = √ 8πσr yields where 2(4π 2 σ 2 ) D/2 and K 7 = 2 (D/2)−1 Γ(D/2).Note that λ is the (unnormalized) probability density function of a multivariate Gaussian distribution and the argument of the integral above is the density function of a chi distribution with D degrees of freedom (appropriately normalized as a result of dividing by the integral across the entire range).Thus, defining the random variable Z distributed according to a chi distribution with D degrees of freedom, note that the integral itself corresponds to the CDF of Z and we seek the smallest R such that P (Z ≤ √ 8πσR) ≥ 1 − .If < 1/2 (i.e., we are capturing at least 50% of the variance in the data), then by definition R must be at least the median value of Z.Standard results on chi distributions then show that where the final inequality holds for D ≥ 2. Consequently, Note that as D gets larger the factor of 0.8 in the lower bound can be strengthened to a factor of 1.
Finally, to estimate the linear dimension itself we need to count the number of eigenvalues that lie within a sphere of radius R. Since the eigenvalues correspond to lattice points, the number of eigenvalues is approximately the volume of this D-dimensional sphere yielding where K 4 is a constant, and we have used Stirling's approximation for the gamma function.If σ < K 4 then the quantity in parentheses is > 1 and thus this lower bound increases exponentially.
Consequently, the linear dimension grows exponentially with D as long as σ is not too large (note that the range along each dimension is normalized to [0, 1], and the threshold on σ effectively means that single tuning curves do not span the entire range of stimulus values).We show results for wide tuning curves and the onset of exponential scaling in Fig. S4.

S2.6 Supra-exponential growth of linear dimension for localized tuning curves from uncertainty principles
In this section we consider genuinely localized tuning curves, by which we mean tuning curves that are contained within some finite support rather than having small but infinite tails (i.e., unlike the Gaussian tuning curves in the previous section).We then use the uncertainty principles of Wigderson & Wigderson [3] to show that in this case linear dimension grows supra-exponentially with D.
We assume that the neurons have N D equally spaced tuning curve centers along each dimension in the hypercube [−1/2, 1/2] D .Thus, there are a total of N D D neurons in a volume of 1 D .If tuning curves have finite support, then the tuning curve of a neuron is contained within some fixed radius R/2.We assume that an individual tuning curve does not cover more than half the space, so that R/2 < 1/4.Consequently, the covariance between two neurons is 0 unless the centers of their firing fields lie within a distance R < 1/2.Thus, The support of the covariance profile c is thus upper-bounded by the number of points inside a sphere of radius R. Since there are N D D points in a volume of 1 D , the support of the covariance profile c is, As before, the eigenvalues of the covariance profile are given by the Fourier transform of the covariance profile c.Let the eigenvalue profile be λ, with -support supp (λ).From the uncertainty principle for -support [3], we have As before, the size of the -support of λ is the 1 − linear dimension.Thus, for translationsymmetric tuning curves which are localized within a fixed radius, the linear dimension grows faster than exponentially as √ D D .

S3 Multiplicative tuning curves
In this section we show that for multiplicative tuning curves the D dimensional covariance matrix and its eigenvalues can be expressed as a tensor product of 1D covariance matrices and product of 1D eigenvalues respectively.We then find a lower bound on the linear dimension of data from such tuning curves using arguments from probability and information theory.
Consider a set of matrices {A 1 , . . ., A D }, with the dimension of the dth matrix being N d × N d (note that the superscripts index the matrices and are not powers).Let A = A 1 ⊗ . . .⊗ A D be their tensor product as represented in matrix form by the Kronecker product.By the definition of the Kronecker product we have

S3.2 Covariance matrix and eigenvalues of covariance matrix for multiplicative tuning curves
We consider neural tuning curves that are a product of lower-dimensional factors.If x is the latent variable and α n is the vector of parameters for the nth neuron, we can write the tuning curve of the n-th neuron as Here the superscript denotes the d-th component of a vector, and the f d 's are scalar functions.For simplicity, we assume that each multiplicative factor f d is a function of a one-dimensional variable x d (i.e., the dth component of x).However in general the f d 's could be functions of disjoint sets of multiple variables and similar results hold.
Assuming that the distribution of the latent variable along each dimension is independent (i.e., p(x) = d p(x d )) and rectangular boundary conditions, the covariance profile between neurons with tuning parameters α m and α n is where the last line serves to define the individual covariance factors c d .
The function c provides the covariance between any two neurons, while each function c d provides the portion of the covariance that comes from the similarity of responses along the dth dimension.
We next choose the locations of the tuning curve parameters to tile the space, forming the points of a lattice with N d tuning curve parameters along the dth dimension.Let the parameters along the d-th dimension be β d 1 , . . ., β d N d .Note that these parameters do not need to be equally spaced.
Since the neuron parameters tile the space, there are totally N = d N d neurons in the population and the D-dimensional parameter vectors for the neurons in the population consist of all possible combinations of one entry from each of these parameter sets.Thus, each α d m ∈ β d 1 , . . ., β d N d .For example, for the first neuron we have α 1 = (β 1 1 , . . ., β D 1 ) and so on.Adopting the notation from Section S3.1, the corresponding parameter indices for the mth neuron are given by Ψ(m).That is . The resulting covariance matrix C has dimension N ×N , where N = d N d .Note that while the function c encodes the covariance between any two neurons, the matrix C contains the covariances for a particular set of neurons sampled from the population (i.e., the function c evaluated for these particular neurons).
We next define a set of smaller matrices C d of dimension.Let the (r, s) entry of The relationship between the functions c d and the matrices C d is analogous to the relationship between the function c and the matrix C: the function c d encodes the covariance that comes from the dth factor for any two neurons, while the matrix C d captures these values for the particular tuning curve parameters (i.e., neurons) sampled in the population.
The (m, n)th entry of C is the covariance between the mth and nth neurons, with parameter vectors α m and α n and can be expressed in terms of the C d s: As described in Section S3.1, this functional form is simply the definition of the tensor product (as represented by the Kronecker product) and thus Consequently, All N eigenvalues of C can be constructed this way, as products of the eigenvalues of the factor C d .Consequently, the eigenvalues of C are given by all possible products of the eigenvalues of the individual factors.

S3.3.1 Factors have same functional form
For simplicity, we first consider a multiplicative model where the tuning to each dimension (i.e., the functions f d ) takes the same functional form.Thus, each factor C d is the same.Let the eigenvalues of the covariance matrix for each factor be γ = {γ 1 , . . ., γ N D }.Note that as we are considering fraction of variance explained, we can rescale {γ 1 , . . ., γ N D } to sum to 1.The N eigenvalues of the overall covariance matrix C are given by all products of the form d γ p d , where each To compute the linear dimension we reframe the problem as a problem in probability theory.Consider a set of D independent random variables Z d , each distributed according to a categorical distribution with outcome probabilities {γ 1 , . . ., γ N D }.That is, Z d takes values in {1, . . ., N d } and P (Z d = k) = γ k .Moreover let the joint random variable Z = (Z 1 , . . ., Z D ), where the Z d 's are independent.Thus the probability distribution of Z is the product of the distributions of the individual Z d 's.
Each eigenvalue of C is in one-to-one correspondence with an outcome of Z, and the value of the eigenvalue corresponds to the probability of that outcome.For example, the largest eigenvalue γ D 1 corresponds to the case where Z 1 = . . .= Z D = 1 (which happens with probability γ D 1 ).Finding the smallest set of eigenvalues whose sum is at least 1− (i.e., the L 1− linear dimension) is equivalent to finding the smallest set of outcomes of Z whose probability is at least 1 − .This subset is often referred to as an -high-probability set [4].
Standard results in information theory provide bounds on the size of this set1 [4].In particular, the asymptotic equipartition property (AEP) shows that as D increases, the number of elements in the high-probability set approaches 2 DH(γ) , where H(γ) = − p γ p log 2 γ p is the Shannon entropy of the distribution {γ 1 , . . ., γ N d }.Given that each element in the high-probability set corresponds to an eigenvalue of the covariance matrix C, the number of eigenvalues required and hence the L 1− dimension asymptotically grows as 2 DH(γ) , thus increasing exponentially with D.
In the main text, Fig. 4b-g, we numerically verify that exponential scaling applies to tuning curves when factors have the same functional form.

S3.3.2 Factors have different functional forms
In the case of non-identical factors, let the eigenvalues of the covariance matrix for the dth factor be As before, we can normalize so that p γ d p = 1.Also as before, we define a set of D independent categorical random variables {Z 1 , . . ., Z D }, with P (Z d = k) = γ d k , and observe that the eigenvalues of C correspond to the probabilities of the outcomes of Z = (Z 1 , . . ., Z D ).
To apply the AEP to this setting, note that the AEP uses the weak law of large numbers to show that the average of log probability converges to its expectation value (i.e., the entropy) and this convergence applies to non-identical but independent random variables if variance is bounded (Chebyshev inequality).The corresponding variance here is p γ d p log 2 2 (γ d p ), the variance of log probability when the γ p 's are interpreted as probability values.Thus we assume that p γ d p log 2 2 (γ d p ) < C for some constant C. Note that this boundedness is a very mild condition-γ p log 2 (γ p ) is bounded by 0.127 on [0, 1], so the only way for the sum to diverge as d increases is if the number of non-zero eigenvalues for the dth factor and the spread of these eigenvalues both grow without bound as d increases.
Given that the variance of log probability is bounded, the AEP again holds, and the size of the high-probability set asymptotically approaches 2 D H , where H = 1 D D d=1 H(γ d ) is the average entropy of the eigenvalue distribution of each factor.
Note that these results imply that, at least asymptotically, the linear dimension of the product of two factors or groups of factors is equal to the product of the linear dimensions of the factors or groups of factors.
In the main text, Fig. 4h-j, we numerically verify that exponential scaling applies to multiplicative population codes in which the factors are different.In Fig. S1, we also show results from a more general model-agnostic framework.For these simulations, rather than choosing specific functional forms for the factors (i.e., the f d 's in Eq. 33), we directly simulate eigenvalue distributions for the matrices C d .Given that we can choose an overall normalization for eigenvalues (since ultimately we are interested in fraction of variance explained), we assume that the eigenvalues sum to 1 and thus form a probability distribution.This normalization allows us to generate candidate eigenvalue distributions using symmetric Dirichlet distributions, commonly used as a means to generate discrete distributions.The concentration parameter of the symmetric Dirichlet distribution determines how sparse the generated distributions are, and allows us to control the linear dimension of the individual factors.We use three different concentration parameters, moving from quite sparse to dense eigenvalue profiles for the single factors (shown in the first row of Fig. S1).We then compute the linear dimension of the tensor product and compare it to the theoretical lower bound, establishing exponential scaling.

S3.3.3 Finite size results and convergence to asymptotic scaling
While the results above provide asymptotic scaling, exponential scaling holds for small D as well, as we support in this section.
Non-asymptotic lower bound for identical factors For a non-asymptotic lower bound on the linear dimension with identical factors, consider the case where only two of the eigenvalues of the matrix C d are nonzero.By normalizing to sum to 1, we can write these eigenvalues as 1 − γ and γ, for some γ ≤ 0.5.As for the multinomial case, the eigenvalues of C are the tensor product of To lower bound the linear dimension, note that if K is such that then the linear dimension L ≥ K k=0 D k .Thus, we will first find such a K and then use it to lower bound the linear dimension L.
Consider a random variable Z distributed according to the binomial distribution with probability γ.That is, X ∼ Bin(D, γ).Note that The median of this binomial distribution is at least γD .Thus if we choose K = γD − 1 we have The analyses for Gaussian tuning curves in Sections S2.3 and S2.5 require that in order for exponential scaling to hold, the tuning curves should not be "too wide".In Fig. S4 we show scaling for a variety of tuning curve widths, straddling the transition between exponential and sub-exponential scaling.As shown in the right inset, for exponential scaling to break down the tuning curves need to be very wide-wide enough that any single neuron responds to all values of the stimulus or latent variable.
In Fig. S5, we show how linear dimension behaves as a function of the number of neurons recorded for the multidimensional case (the one-dimensional case is included in Fig. 2 in the main text).These plots illustrate that if the number of neurons recorded is very low, then the linear dimension appears artificially low (note that the linear dimension is always less than or equal to the number of recorded neurons).However, it increases rapidly as more neurons are recorded until it reaches the true value.
In Fig. S6 we show that results for translation symmetric populations hold when tuning curves are only approximately shifted copies of each other rather than all having the same shape.
Finally, the results presented in the study for translation symmetric populations so far assume that tuning curve centers either tile the space (i.e., are located at the points of a lattice in the latent space) or are sampled randomly but uniformly through the space.However, there often exist privileged locations in stimulus or latent variable space that are encoded with a higher density of tuning curve centers.Examples include orientation tuning in area V1, where the horizontal and vertical directions have more tuned neurons [7], and place cells in the hippocampus, which cluster around reward and landmark locations [8].To address this setting, in Fig. S7 we show simulated results from models with a mixture of tuning curves, consisting of a translation-symmetric background population (i.e., as shown in the main paper) and an additional population of neurons whose tuning curves are clustered around certain preferred locations (e.g., modeling cardinal directions in visual space or reward locations in a hippocampal place field map).
We consider two settings for the additional population: 1.In the first setting, shown in Fig. S7b and c, the added tuning curves have the same width as the background population-this situation would model, e.g., place fields clustering around reward or landmark locations.We find that the linear dimension is almost unchanged by these additional neurons.The preserved linear dimension is because as long as the background population covers the space the added neurons overlap with existing neurons.Thus adding more neurons at certain locations simply adds rows to the data matrix that can be wellapproximated by linear combinations of existing rows and the rank of the data matrix does not change.Of course these additional neurons likely offer benefits in terms of faster decoding, noise reduction and so on.
2. In the second setting, shown in Fig. S7d and e, the added tuning curves have sharper tuning than the rest of the population, reflecting higher resolution encoding at certain locations.In this case, the added neurons effectively make the population code sparser and thus increase the linear dimension.
Thus, for both settings the linear dimension for the homogeneous case is a lower bound for the linear dimension of the inhomogeneous case.
) Finally, we express the eigenvalues of C in terms of products of the eigenvalues of the C d 's.Let {γ d p d , u d p d } be the p d th eigenvalue-eigenvector pair for each C d .Note that [1 − γ, γ] taken D times, i.e. [1 − γ, γ] ⊗ • • • ⊗ D times [1 − γ, γ].In descending order of magnitude, there is 1 eigenvalue of magnitude (1 − γ) D , D 1 eigenvalues of magnitude (1 − γ) D−1 γ, and so on, with D k eigenvalues of magnitude (1 − γ) D−k γ k .

Figure S1 :
Figure S1: Linear dimension from different eigenvalue distributions for each dimension.(a) Top panel: Eigenvalue distributions (8 distributions, one for each dimension, of 10 eigenvalues each) were created from a symmetric Dirichlet distribution with concentration parameter α = 0.2, yielding relatively sparse eigenvalue distributions.Bottom panel: linear dimension of tensor product of first D eigenvalue distributions along with lower bound.The lower bound is calculated as 2 ( H−0.05)D , where H is the average entropy of the individual eigenvalue distributions.As in the main text, note that asymptotically 2 ( H−δ)D is a lower bound for any δ, so the choice of 0.05 is for convenience but shows that exponential scaling applies for small D. (b) As in (a) but for α = 0.5, (c) As in (a) but for α = 0.8.

Figure S4 :
Figure S4: Linear dimension for wider tuning curves Linear dimension vs. D for wide tuning with σ between 0.22 to 0.4 in steps of 0.02.Corresponding Gaussians of widths σ = 0.22, 0.28, 0.34, 0.4 shown on the side.The exponential scaling transitions to subexponential scaling once tuning curves cover the entire range of the period [0,1].

Figure S5 :
Figure S5: Linear dimension as a function of the number of neurons (N ).a) Linear dimension against N for d=2.b) Linear dimension against N for d=3.

Figure S6 :
Figure S6: Linear dimension for approximately translation symmetric tuning curves.Each tuning curve was point wise multiplied by a random matrix of the same shape as the tuning curves whose elements were drawn uniformly from [1− , 1+ ], with = 0.2 (i.e., 20% multiplicative noise).The linear dimensions for data with noise lie within the 90% confidence interval of the linear dimensions of data without noise.a) Example of 7 neurons with circular Gaussian tuning curve with multiplied noise.b) Linear dimension against d for data from d dimensional Gaussian tuning curves multiplied by noise for σ = 0.1.c) Same as (b) for σ = 0.2.

Figure S7 :
Figure S7: Linear dimension of data from a background translation-symmetric population along with a higher density of neurons at special locations.a) Schematic of the two populations, shown for 1D (left) and 2D (right).One population (black) has tuning centers that evenly cover the space; a second population (blue) has tuning centers that cluster around certain locations.(b) Both populations have tuning width of σ = 0.1 (c) As in (b) but for σ = 0.2.(d) Background population has tuning width of 0.1 while population clustered at special locations has sharper tuning width of 0.05.(e) As in (d) but tuning widths of 0.2 and 0.1.