Mathematical Appendix

1 Translation-independent connectivity implies delocalized eigenvectors In this section, we show that a linear network with translation-independent connectivity has delocalized eigenvectors, even if the connectivity is strongly localized. This is equivalent to the well-known fact that the eigenvectors of a circulant matrix are the basis functions of a discrete Fourier transform and to Bloch's theorem in solid-state physics [1]. A linear network with N nodes has an N-dimensional coupling matrix, W. If the connectivity profile is translation-independent then W (j, k) = c(j − k), where c is a periodic function with period N. W is thus a circulant matrix.

In this section, we show that a linear network with translation-independent connectivity has delocalized eigenvectors, even if the connectivity is strongly localized. This is equivalent to the well-known fact that the eigenvectors of a circulant matrix are the basis functions of a discrete Fourier transform and to Bloch's theorem in solid-state physics [1].
A linear network with N nodes has an N -dimensional coupling matrix, W . If the connectivity profile is translation-independent then W (j, k) = c(j − k), where c is a periodic function with period N . W is thus a circulant matrix.
For an integer n, define ω n = 2πn N . Applying W to the vector v λ = [e iωn , e 2iωn , . . . , e N iωn ] yields The function in parentheses is periodic with period N, and we shift the limits in the sum to give where λ = N p=1 c(p)e −iωnp is the eigenvalue corresponding to v λ . So W v λ = λv λ and v λ is an eigenvector of W with eigenvalue N p=1 c(p)e −iωnp . Note that λ is the Fourier transform of the connectivity profile at frequency ω n . v λ has an absolute value of 1 everywhere and so is maximally delocalized. Any network with translation-independent connectivity will show delocalized activity patterns and, because a circulant matrix is normal (i.e. the eigenvalue decomposition is well-conditioned), the results are similar if the connectivity profile is only approximately the same at each node.

The functional form of localized eigenvectors from a first-order expansion
We now consider a linear network with a localized connectivity profile that varies between nodes. We explore conditions under which this gives rise to localized eigenvectors and derive the equation for the shape of these eigenvectors (Eq. 2 in the main text). As in the previous section, we rewrite the connectivity matrix in terms of a relative coordinate p = j − k. However, since connectivity now depends on absolute position (and not just the relative separation between nodes), we use an additional variable to index position. Let Thus c(j, 2) = W (j, j − 2) indexes feedforward projections that span two nodes, and c(5, p) = W (5, 5 − k) indexes projections to node 5. For any fixed j, c(j, p) is defined from p = j − N to p = j − 1. We extend the definition of c(j, p) to values outside this range by defining c(j, p) to be periodic in p, with the period equal to the size of the network. This is purely a formal convenience to simplify the limits in certain sums and does not constrain the connectivity between the nodes of the network. Consider the candidate eigenvector v λ (j) = g λ (j)e iωj . This is a generalization of the eigenvectors in Eq. 1. The dependence of g λ on j allows the magnitude of the eigenvector to depend on position; setting this function equal to a constant returns us to the translation-independent case. Moreover, note that g λ (j) depends on λ, meaning that eigenvectors corresponding to different eigenvalues (timescales) can have different shapes. For example, different eigenvectors can be localized to different degrees, and localized and delocalized eigenvectors can coexist. ω allows the eigenvector to oscillate across nodes, as in Eq. 1. As before, ω varies between eigenvectors and so depends on λ.
Applying W to v λ yields Here the term in brackets is no longer independent of j. So far we have made no use of the requirement of local connectivity and, given that g λ is an arbitrary function of position and can be different for different timescales, we have placed no constraints on the shape of the eigenvectors. By including an oscillatory term (e iωj ) in our ansatz, we ensure that g λ (j) is constant when connectivity is translation-invariant; this will simplify the analysis.
We now approximate both c(j, p) and g λ (j − p) to first-order (i.e. linearly): Here j 0 is a putative center of the eigenvector. f 1 and f 2 are functions that provide the slope in a linear approximation of c and g λ . There are many such possibilities and, at the moment, we have placed no constraints on them. In the continuous limit, setting f 1 and f 2 to be the appropriate derivatives yields the best local linear approximation (and the expansion is then a first-order Taylor series). Substituting Eq. 5 into Eq. 4 we get We expect these approximations to be valid only locally. However, if connectivity is local then the major contribution to the sum comes from small values of p. For large values of p, g λ (j − p) is multiplied by connectivity strengths close to 0 and so we only need to approximate g λ for p close to 0. Similarly, in approximating c(j, p) around j = j 0 , we expect our approximation to be good in the vicinity of j = j 0 . However, if our eigenvector is indeed localized around j 0 , then g λ (k) is small when |k − j 0 | is large. For small p, large values of |k − j 0 | approximately correspond to large values of |j − j 0 |, and so c(j, p) makes a contribution to the sum only when j ≈ j 0 .
The zeroth order term in Eq. 6 is As before, the function in parentheses is periodic in p with period N (recall that c(j, p) was extended to be periodic in p). Thus to zeroth-order v λ is an eigenvector with eigenvalue Before proceeding further, we contrast this expression with Eq. 1, which is similar except that c(j 0 , p) does not depend on j 0 (translation-invariance). Eq. 1 tells us that the eigenvectors are described by a common functional form (e iωj ), with a parameter (ω) that varies from eigenvector to eigenvector. It also specifies an analytic relation between ω for a particular eigenvector and the corresponding eigenvalue. We seek a similar relationship for the more general case where connectivity depends on position.
For λ to be an exact eigenvalue in Eq. 6, the higher-order terms should vanish. By setting the first-order term in this equation to 0, we obtain an equation for g λ (j): To get a closed-form expression for g λ , we need explicit forms for f 1 and f 2 . If the number of nodes in the network is large, or if we assume that both c and g are sampled from some underlying smooth function (as is the case in our examples), then f 1 and f 2 are naturally chosen to be the derivatives, which are the best local linear approximations (by definition). In this case we set Substituting into Eq. 8 gives us a differential equation in g λ : Thus α 2 is a ratio of discrete Fourier transforms at the frequency ω. Note that the denominator is a weighted measure of network heterogeneity at the location j 0 . Also note that α 2 can be written in terms of λ as (compare the twist condition of [3]): Solving for g λ in Eq. 9 yields where C 1 is a constant. Thus, to first-order, the eigenvector is given by the modulated Gaussian function In general, α can be complex. In order for v λ to be localized, Re(α 2 ) must be positive for the corresponding values of j 0 and ω, and we only accept an eigenvector as a valid solution if this is the case. Thus the approach is selfconsistent: we assumed that there existed a localized eigenvector, combined this with the requirement of local connectivity to solve for its putative shape, and then restricted ourselves to solutions that did indeed conform to our initial assumption.
In Eq. 6 we expand both g λ and c to first-order, using functions f 1 and f 2 . These functions are somewhat arbitrary, in that they are only required to be good local linear approximations. In the large-N limit (or if c(j, p) is sampled from some underlying continuous function) these are most naturally taken to be the derivatives, since the derivative is the best local linear approximation. Taking f 2 to be the derivative of g λ has the advantage of giving a differential equation for g λ , which can be solved to yield a closed-form expression. More generally, though, for f 2 we could use a discrete difference to approximate the changes in g (e.g. ∆g = g(j + 1) − g(j)) and attempt to solve the resulting difference equation. On the other hand, the replacement of f 1 by ∂c ∂j (j 0 , p) only serves to help us find closed-form expressions for α 2 and has no consequences for the functional form of Eq. 9.
Finally, we include an oscillatory term (i.e. e iωj ) in our ansatz so that a zeroth-order expansion in connectivity (i.e. connectivity is locally translationally invariant) yields eigenvectors given by a zeroth-order expansion in g (i.e. g λ is locally constant). This ensures that the zeroth order expansion locally restates the conclusions of Section 1.
Before we proceed to applications of this analysis, it is useful to step back and to clarify what our theory has given us.
Contrast our result with the discussion of Eq. 1 that followed Eq. 7. For a particular class of networks/matrices we have identified a general functional form for eigenvectors (that of a modulated Gaussian), with two parameters (j 0 and ω) that vary from eigenvector to eigenvector. We have also specified an analytic relation between j 0 and ω for a particular eigenvector and the corresponding eigenvalue (or timescale): given an eigenvalue λ, we use Eq. 7 to find the values of j 0 and ω that correspond to λ (note that we're matching two real parameters to one complex value). We use these to calculate α 2 . However, because of our initial assumption that the eigenvector under consideration was localized, our result only applies to eigenvectors for which α 2 > 0.
For a circulant matrix, the parameter ω n in Eq. 1 is known and the eigenvalues can be computed from the ω n . In general, for the matrices we consider, our method doesn't provide a way to analytically compute the eigenvalues and, given that we consider a large class of matrices, such a formula would be unlikely. Instead our theory identifies how the shape of a particular localized eigenvector depends on network architecture and on its corresponding timescale.
While the theory doesn't tell us which timescales a finite network displays (stronger asymptotic results may be possible), it identifies a region of the complex plane within which the eigenvalues of the network lie. This is the range of λ(j 0 , ω), swept out by varying j 0 and ω. For each value of j 0 and ω we can compute α 2 , and eigenvalues / timescales that lie in the subset of this region where α 2 > 0 correspond to localized eigenvectors (this is equivalent to the twist condition in [3]). Thus our theory tells us where the onset of localization lies and how changing the network parameters changes the boundary between regions of localized and delocalized timescales. If α 2 is always positive then all eigenvalues are localized. Moreover, the values of j 0 at which α 2 is positive tell us the nodes that can potentially host localized timescales, which can be useful in analyzing networks with a mixture of localized and delocalized timescales. For an example of such a network, see Figure 3 - Figure Supplement 1.
The theory tells us that, to first order, localized eigenvectors are modulated Gaussians with two parameters that depend both on the connectivity architecture and on the particular eigenvalue/timescale under consideration. Some eigenvectors/timescales are localized (i.e. have a positive width parameter/α 2 ) and others are delocalized (have a negative α 2 ). Both the existence of localization and the width of the localized eigenvectors can depend on timescale and thus to predict the shape of a particular eigenvector we need to know the timescale it corresponds to. However, once we know this timescale we have an analytic expression for how the eigenvector shape depends on network parameters and how changing network parameters promotes or hinders localization (of course, changing network parameters will also change the eigenvalue but if these are changed continuously then Eq. 7 should tell us how the eigenvalue changes locally).
In a number of cases, we can draw further conclusions about the shapes of all localized eigenvectors in a network without computing the eigenvalues. For example, in the network of Fig. 3 (analyzed further in Section 3 below), the eigenvector width does not depend on the location of the eigenvector. This is a special example but not an unnatural one; a gradient of local properties is one of the simplest deterministic ways to break translation-invariance, and as such is particularly interesting theoretically. For the network of Fig. 4 (see Section 4 below), the eigenvectors are all localized and the width depends only weakly on position. Such simplifications are possible because eigenvector width depends on j 0 and ω (Eq. 10) in a relatively simple way.
If there are constraints on the spectrum / timescales of the network, then the theory allows us to transfer this into constraints on the shapes of the corresponding eigenvectors. For example, if a network has low-rank connectivity then the spectrum of the corresponding matrix has a cluster of eigenvalues around the intrinsic timescale of each node. As another example, models with connectivity that decays very rapidly with distance will tend to have eigenvalues clustered around the real axis (with distance bounded by Gerschgorin disks), which will force ω to be close to 0 or π. This is also approximately true for the network architecture of Fig. 3, where the eigenvalues lie on the real line (in the case with only nearest-neighbor connections this is exact and can be shown with a similarity transform).
Finally, the analysis reveals the qualitative factors that are important for a network to localize (i.e. breaking translation-invariance, asymmetry), and can help in conceiving of possible network architectures. The first example (Fig. 3) emerged from an attempt to understand the origins of timescale localization in a model of large-scale interactions in the cortex. The second network (Fig. 4) was designed from a calculation requiring the width of localized eigenvectors to be independent of ω.

Localized eigenvectors for a network with a gradient of self-coupling strengths
We now apply the theory from the previous section to derive Eq. 4 in the main text.
Recall that the connectivity for this architecture is given by Define p = j −k and rewrite the connectivity in terms of a relative coordinate From Eq. 7 the eigenvalues are For a particular eigenvalue λ, we solve this equation to give j 0 and ω for the corresponding eigenvector. We then use Eq. 10 to calculate α 2 . Substituting allows us to simplify Eq. 15 yielding e iω e 1/lc − e iω , and noting that the sums in Eq. 16 are derivatives of those in Eq. 15 gives By matching the eigenvalues to j 0 and ω we find that ω = π, so that the equation for α 2 can be further simplified to give:

Localized eigenvectors for a network with a gradient of connectivity decay lengths
Recall that the connectivity is for j < k We can calculate α 2 as in Eqs. 10 or 11. While we will typically do this numerically, we can get an approximate closed-form expression by noting that, as a consequence of local connectivity, most of the contribution to the sums in Eq. 20 comes from small values of p and, provided that j 0 is not too small, we can replace j 0 − p by j 0 ± 1 to get These are geometric series and can be solved in closed-form. As an added simplification, note that as in the previous section, With this approximation we get where The theoretical predictions plotted in Fig. 4 are from this approximation.

The functional form of localized eigenvectors from a second-order expansion
In Section 2 we expanded both c(j, p) and g λ (j) to first-order, and solved for the functional form of g λ . However, as shown in Fig. 3, this approximation breaks down for the architecture of Eq. 13 when µ f − µ b becomes small. Here we improve the approximation by extending it to second-order in g λ . As before, consider a candidate eigenvector v λ (j) = g λ (j)e iωj and apply the matrix W to get We again approximate both c(j, p) and g λ (j−p), but carry the approximation of g λ to second-order: For simplicity, we've written the coefficients in the expansions of c and g λ in terms of their derivatives (i.e. as Taylor series); this is exact in the continuous limit (see the discussion of this issue in section 2). Substituting these approximations gives The zeroth order term gives us the eigenvalue: To make this exact we require the higher-order terms to vanish. When the second-order term is small, then this is equivalent to requiring that the firstorder term vanish and this gave us a first-order differential equation for g λ (Eq. 9). If, however, the first-order expansion does a poor job of capturing the behavior of the eigenvector around the point of interest, then we require the sum of the first and second-order terms to vanish, giving a second-order differential equation in g λ .
The coefficients in Eq. 22 are linear in j. If we had expanded c(j, p) to second-order, these coefficients would be quadratic in j. The coefficients also involve sums over p. Since these do not depend on j, they will not affect the form of the differential equation. However, they encode the dependence of the parameters of the solution on the parameters of the connectivity and so it will be useful to calculate them in closed form. In the next section, we solve Eq. 22 for a special case.
6 Second-order expansion for a network with a gradient of self-coupling strengths The solutions to Eq. 22 are typically not Gaussian and take various functional forms. Here we solve for the family of solutions that arise when connectivity is translationally invariant except for self-couplings (i.e. the matrix varies along the main diagonal).
The constraint that connectivity is translationally invariant except along the main diagonal is equivalent to requiring that ∂c(j,p) ∂j = 0 when p = 0. This simplifies Eq. 22 to yield p p 2 2 c(j 0 , p)e −iωp g λ (j)− p pc(j 0 , p)e −iωp g λ (j)+ ∂c ∂j j0,0 (j−j 0 )g λ (j) = 0 (23) For notational convenience, abbreviate the parameters in this equation as Here Ai and Bi are Airy functions [2] and C 1 and C 2 are constants. Bi(x) diverges supra-exponentially as x → ∞, and falls off slowly as x → −∞ [2] and so a localized eigenvector will only involve the Ai(x) term. Thus we have g λ (j) = C 1 e β2(j−j0) Ai β 1 (j − j 0 ) + β 2 2 β 2/3 1 Note that this function is asympotically less localized than the Gaussians from a first-order expansion, which decay as e −j 2 : for increasing j, the function falls off as e −kj 3/2 , while for decreasing j it falls off exponentially. Thus far we have not assumed any particular functional form for connectivity, since the functional form only changes the parameters in Eq. 25. We now use the connectivity of Eq. 13 to derive Eq. 5 in the main text. Recall that c(j, p) =      µ 0 + ∆ r j for p = 0 µ f e −p/lc for p > 0 µ b e p/lc for p < 0 And the eigenvalue is λ(j 0 , ω) = µ 0 + ∆ r j 0 + µ f e −iω e 1/lc − e −iω + µ b e iω e 1/lc − e iω . As µ f − µ b → 0, β 2 becomes small and the exponential in Eq. 25 becomes increasingly flat. This delocalizes the leading edge of the eigenvector (see Fig.  3).