Mechanisms of distributed working memory in a large-scale model of the macaque neocortex

Working memory, the brain’s ability to retain and manipulate information internally, has been traditionally associated with persistent neural firing in localized brain areas such as those in the frontal cortex (Fuster 1973, Funahashi et al., 1989; Goldman-Rakic 1995; Romo et al., 1999; Rigotti et al., 2013; Kopec et al., 2015; Inagaki et al., 2019). However, self-sustained neural persistent activity during working memory is present in multiple brain regions (Romo et al., 2004; Christophel et al., 2017; Leavitt et al., 2017; Sreenivasan and D’Esposito, 2019), the underlying mechanism is unknown. We developed an anatomically constrained large-scale computational model of the macaque cortex endowed with a macroscopic gradient of synaptic excitation, to investigate the origin of distributed working memory representation. We found that long-range inter-areal reverberation can support the emergence of persistent activity patterns across multiple cortical regions, by virtue of a robust bifurcation in space, even when none of isolated local areas is capable of generating persistent activity. The model uncovered a host of distinct persistent activity patterns (attractor states), and provides experimentally testable predictions that cannot be explained in local circuit models. Simulating cortical lesions reveals that distributed activity patters are resilient against simultaneous lesions of multiple cortical areas, but depend on areas that form the core of the entire cortex. This work provides a theoretical framework for identifying large-scale brain mechanisms and computational principles of distributed cognitive processes.


Figure 1: Scheme and anatomical basis of a multi-regional macaque neocortex model. (a) Lateral view of the macaque cortical surface with areas in color. (b) In the model, inter-areal connections are calibrated by mesoscopic connectomic data
, each parcellated area is modeled by a population firing rate description with two selective excitatory neural pools and an inhibitory neural pool (Wong and Wang, 2006). (c) Correlation between spine count data (Elston, 2007) and anatomical hierarchy as defined by layer-dependent connections in Markov et al., (2014).
In local circuit models of working memory (WM) (Wang, 2001;Compte et al., 2006), areas high in the cortical hierarchy make use of sufficiently strong synaptic connections (notably involving NMDA receptors, see Wang et al., 2013) to generate self-sustained persistent activity. Specifically, the strength of local synaptic reverberation must exceed a threshold level (in our model, the local coupling parameter JS must be larger than a critical value of 0.4655), for an isolated local area to produce stimulus-selective persistent activity states that coexist with a resting state of spontaneous activity (operating in a multistable regime rather than in a monostable regime (Fig. 2a)). However, there is presently no conclusive experimental demonstration that an isolated cortical area like dorsolateral prefrontal cortex (dlPFC) is indeed capable of generating mnemonic persistent activity. In this study, we first examined the scenario in which all areas, including dlPFC (9/46d) at the top of the hierarchy, have JS values below the critical value for multistability (Fig. 2a). In this case, any observed persistent activity pattern must result from inter-areal connection loops. In a simulated delayed response task, a transient visual input excites a selective neural pool in the primary visual cortex (V1), which yielded activation of other visual areas such as MT during stimulus presentation (Fig. 2b, upper left). After stimulus withdrawal, neural activity persists in multiple areas across frontal, temporal and parietal lobes (Fig. 2b, lower right). This activation pattern was stimulus specific, so only the neural pool selective for the shown stimulus in each cortical area displayed elevated persistent activity ( Fig. 2c; Extended Data Fig. 3). The same result is obtained when stimulating other sensory modalities (Extended Data Fig. 4) and, if simplified AMPA dynamics is considered, also for brief stimuli (Extended Data Fig. 5). We observed cross-area variations of neural dynamics: while areas like TEpd displayed a sharp binary jump of activity, areas like LIP exhibited a more gradual ramping activity, resembling temporal accumulation of information in decision-making (Shadlen and Newsome, 2001).
Consistent with experimental observations (Leavitt et al., 2017), early sensory areas such as V1 and MT did not display persistent activity. This is ensured in our model by a combination of two factors. First, local synaptic coupling of early sensory areas at the bottom of the hierarchy are weak. Second, a certain level of preferential targeting inhibitory neurons by top-down projections prevents indiscriminate persistent activation across all cortical areas (Fig. 2d). Such bias towards inhibitory neurons of feedback projections, supported by experimental evidence (Tsushima et al., 2006), also allows the distributed WM patters to exist for a wide range of global scaling values for long-range synaptic strengths (Fig. 2e).
When we plotted the firing rate of stimulus-selective persistent activity across 30 areas along the hierarchy, our results revealed a gap that separated the areas displaying persistent activity and those that did not (Fig. 2f). This is a novel type of bifurcation or abrupt transition of behavior that takes place in space, rather than as a function of a network parameter like in Fig. 2a. As a matter of fact, the relevant parameter here is the strength of synaptic excitation that varies across cortical space (Extended Data Fig. 6), in the form of a macroscopic gradient. Therefore, bifurcation is robust and does not require precisely tuning a parameter to a threshold value.   We recognized that a large-scale circuit can potentially display a large number of distributed persistent activity patterns (attractors), some of them may not be accessible by stimulation of a primary sensory area. Note that distinct attractor states are defined here in terms of their spatial patterns, not stimulus tuning; for the sake of simplicity we assumed only two selective neural pools per local area. We developed a numerical approach to identify and count distinct attractors. Our aim is not to exhaustively identify all possible attractors (as the parameter space is too large for that), but to get insight on how our estimations depend on relevant parameters such as the global scaling factor G of all local and long-range excitatory connections, or and the maximum area-specific synaptic strength Jmax. Five examples are shown in Fig. 3a. Our analysis included four cases; two of them with the strength of local and long-range connectivity above the bifurcation threshold for certain areas high in the hierarchy (Fig. 3b), so that some areas like dlPFC have strong enough local reverberation to sustain activity independently, while other areas like TEpd and LIP require long-range support to participate in WM. In all four cases, the number of attractors is a function of the scaling factor G, with an optimal G value maximizing the number of attractors (Fig. 3c). This optimal G value shifted towards lower values as the proportion of intrinsically multistable areas increased, with peak number of attractors simultaneously increasing (Fig. 3d). Across all four cases and G values considered, we found a significant positive correlation between the number of areas involved in a given attractor and the average activity level of these areas (Fig. 3e). With a high proportion of intrinsically multistable areas, attractors tend to be largely restricted to these multistable areas (located at the top of the hierarchy), while in cases with zero or low proportion of multistable areas attractors involve a larger number of areas and are more diverse in their area composition (Fig. 3f). The distributed nature of WM has implications for the impact of perturbations on performance. We simulated a delayed response task with distractors ( Fig. 4a), in which if the to-be-remembered cue is A, B is presented as distractor during the delay period (and vice versa), and found that only strong distractors (compared to the original visual cue) are able to destabilize the memory (Fig. 4b). This is due to the robustness of a distributed attractor compared to a local circuit mechanism, but also to the inhibitory effect of feedback projections which dampen the propagation of distractor signals (cf. responses in V1 and MT).
Finally, we tested the effect of lesioning cortical areas in WM tasks. Lesions of individual areas have only a local effect on the overall maintenance of the distributed attractor, even when lesioning prefrontal areas (Fig. 4c). The number of areas involved in the evoked attractor decreases only linearly with the number of simultaneously (randomized) lesioned areas (Fig. 4d) and decreases a bit more abruptly when lesioning areas in reverse hierarchical order (Extended Data Fig. 7). A more systematic evaluation revealed that lesioning most areas have limited consequences for the total number of available attractors, with the exception of several temporal and prefrontal areas for which the overall impact is large ( Fig. 4e; Extended Data Fig. 7). Interestingly, the latter areas are part of the anatomical 'bowtie hub' of the macaque cortex (Fig. 4f, Markov et al., 2013).
In summary, we have presented a large-scale circuit mechanism of distributed working memory, Bouchacourt and Buschman, 2019). Our model yields several experimentally testable predictions in monkey and rodent experiments, including (i) the need of strong large-scale interactions to sustain distributed WM patterns, (ii) a positive correlation between the number of areas involved in a WM task and their average firing rate of persistent activity, and (iii) robustness of working memory encoding against distractors via inter-areal inhibitory feedback. This model and its extensions can serve as a computational platform to elucidate complex experimental observations, such as inactivation by optogenetic method of various cortical areas in behaving animals performing a working memory task, in future research. Conceptually, this work showed a novel form of bifurcation in space as a mechanism to generate differential functions across different cortical areas, and represents the first large-scale cortical model for distributed cognitive processes. Markov et al., 2013).

Anatomical data
The anatomical connectivity data used has been gathered in an ongoing track tracing study in macaque and has been described in detail elsewhere (Markov et al., 2012;Mejias et al., 2016). Briefly, retrograde tracer injected into a given target area labels neurons in a number of source areas projecting to the target area. By counting the number of labeled neurons on a given source area, Markov et al. defined the fraction of labeled neurons (FLN) from that source to the target area. FLN can serve as a proxy for the 'connection strength' between two cortical areas, which yields the connectivity pattern of the cortical network (Extended Fig. 1a, b). In addition, Markov et al. also measured the number of labeled neurons located on the supragranular layer of a given source area. Dividing this number over the total number of labeled neurons on that source area, we can define the supragranular layered neurons (SLN) from that source area to the target area (Extended Fig. 1c, d).
SLN values may be used to build a well-defined anatomical hierarchy (Felleman and Van Essen, 1991;Markov et al. 2014). Source areas located lower (higher) than the target area in the anatomical hierarchy, as defined by Felleman and Van Essen (1991), display a progressively higher (lower) proportion of labeled neurons in the supragranular layer. As a consequence, the lower (higher) the source area relative to the target area, the higher (lower) the SLN values of the source-to-target projection.
Iterating these measurements across other anatomical areas yields and anatomical connectivity matrix with weighted directed connections and an embedded structural hierarchy. The 30 cortical used to build our data-constrained large-scale brain network are, in hierarchical order: V1, V2, V4, DP, MT, 8m, 5, 8l, 2, TEO, F1, STPc, 7A, 46d, 10, 9/46v, 9/46d, F5, TEpd, PBr, 7m, LIP, F2, 7B, ProM, STPi, F7, 8B, STPr and 24c. Finally, data on wiring connectivity distances between cortical areas is available for this dataset as well, allowing to consider communication time lags when necessary (we found however that introducing time lags this way does not have a noticeable impact on the dynamics of our model). The connectivity data used here is available to other researchers from core-nets.org.
The corresponding 30x30 matrices of FLN and SLN are shown in Extended Fig. 1b, d. Areas in these matrices are arranged following the anatomical hierarchy, which is computed using the SLN values and a generalized linear model (Chaudhuri et al., 2015;Mejias et al., 2016). Surgical and histology procedures were in accordance with European requirements 86/609/EEC and approved by the ethics committee of the region Rhone-Alpes.
In addition to the data on FLN and SLN across 30 cortical areas, we used additional data to constrain the area-to-area differences in the large-scale brain network. In particular, we have collected data on the total spine count of layer 2/3 pyramidal neuron basal dendrites across different cortical areas, as the spine count constitutes a proxy for the density of synaptic connections within a given cortical area (see Elston, 2007 for a review). A full list of all area-specific values of spine densities considered and their sources is given below:  Fig. 2 for the effect of this correction on the overall gradient established by the spine count data, and the correlation of such gradient with the SLN hierarchy.

Local neural circuit
We describe the neural dynamics of the local microcircuit representing a cortical area with the Wong-Wang model (Wong and Wang, 2006). In its three-variable version, this model describes the temporal evolution of the firing rate of two input-selective excitatory populations as well as the evolution of the firing rate of an inhibitory population. All populations are connected to each other (see Fig. 1a). The model is described by the following equations: Here, SA and SB are the NMDA conductances of selective excitatory populations A and B respectively, and SC is the GABAergic conductance of the inhibitory population. Values for the constants are τN=60 ms, τG=5 ms, γ=1.282 and γI=2. The variables rA, rB and rC are the mean firing rates of the two excitatory and one inhibitory populations, respectively. They are obtained by solving the transcendental equation = ( ) at each time step, with Ii being the input to population I, given by In these expressions, Js, Jc are the self-and cross-coupling between excitatory populations, respectively, JEI is the coupling from the inhibitory populations to any of the excitatory ones, JIE is the coupling from any of the excitatory populations to the inhibitory one, and JII is the self-coupling strength of the inhibitory population. The parameters I0i with i=A, B, C are background inputs to each population. Parameters are Js=0.3213 nA, Jc=0.0107 nA, JIE=0.15 nA, JEI=-0.31 nA, JII=-0.12 nA, I0A=I0B=0.3294 nA and I0C=0.26 nA. Later we will modify some of these parameters in an area-specific manner (in particular Js and JIE) to introduce a gradient of properties across the cortical hierarchy. The term I i net denotes the long-range input coming from other areas in the network, which we will keep as zero for now but will be detailed later. Sensory stimulation can be introduced here as extra pulse currents of strength Ipulse =0.2 and duration Tpulse =0.5 sec (unless specified otherwise).
The last term xi(t) with i=A, B, C is an Ornstein-Uhlenbeck process, which introduces some level of stochasticity in the system. It is given by Here, ξi(t) is a Gaussian white noise, the time constant is τnoise=2 ms and the noise strength is σA,B=0.005 nA for excitatory populations and σC=0 for the inhibitory one. (Eq. 8) The values for the parameters are a=135 Hz/nA, b=54 Hz and d=0.308 s. For the inhibitory population a similar function can be used, but for convenience we choose a threshold-linear function: The values for the parameters are gI=4, c1=615 Hz/nA, c0=177 Hz and r0=5.5 Hz. Finally, it is sometimes useful for simulations (although not a requirement) to replace the transcendental equation = ( ) by its analogous differential equation, of the form The time constant can take a typical value of τr=2 ms.

Gradient of synaptic strengths
Before considering the large-scale network and the inter-areal connections, we look into the area-toarea heterogeneity to be included in the model.
Our large-scale cortical system consists of N=30 local cortical areas, for which inter-areal connectivity data is available. Each cortical area is described as a Wong-Wang model of three populations like the ones described in the previous section. Instead of assuming areas to be identical to each other, here we will consider some of the natural area-to-area heterogeneity that has been found in anatomical studies. For example, work from Elston (2007) has identified a gradient of dendritic spine density, from low spine numbers found in early sensory areas to large spine counts found in higher cognitive areas. This may reflect an increase of local recurrent strength as we move from sensory to association areas. In addition, cortical areas are distributed along an anatomical hierarchy ( In the following, we will assign the incoming synaptic strength (both local and long-range) of a given area as a linear function of the dendritic spine count values observed in anatomical studies, with agerelated corrections when necessary. Alternatively, when spine count data is not available for a given area, we will use its position in the anatomical hierarchy, which displays a high correlation with the spine count data, as a proxy for the latter. After this process, the large-scale network will display a gradient of local and long-range recurrent strength, with sensory/association areas showing weak/strong local connectivity, respectively. We denote the local strength value of a given area i in this gradient as hi, and this value normalized between zero (bottom of the gradient, area V1) and one.
We assume therefore a gradient of values of Js, with its value going from Jmin to Jmax. Having large values of Js for association areas strongly affects the spontaneous activity of these areas, even without considering inter-areal coupling. A good way to keep the spontaneous firing rate of these areas within physiologically realistic limits is to impose that the spontaneous activity fixed point is the same for all areas (Murray et al., 2017b). To introduce this into the model, we take into account that the solutions in the spontaneous state are symmetrical: SA=SB=S (we assume zero noise for simplicity). The current entering any of the excitatory populations is then (assuming I0A=I0B=I0): Assuming a fast dynamics for rC and SC (mediated by GABA) as compared to SA and SB (mediated by NMDA) we can obtain the approximate expression for SC: The equation for the excitatory current has then the form To maintain the excitatory input (and therefore the spontaneous activity level S) constant while varying Js across areas, we just have to keep the quantity + + 2 ≡ 0 constant (for the original parameters of the isolated area described above, we obtain J0=0.2112 nA). A good choice, but not the only one, is to assume that the excitatory synapses to inhibitory neurons, JIE, also scales with the ranks and with Js accordingly: This linear relationship ensures that the spontaneous solution is the same for all areas in the network.
Since JIE needs to be non-negative, this imposes a minimum value of Jmin=0.205 nA for Js. The particular maximum value of Js, namely Jmax, will determine the type of WM model we assume. Since the bifurcation point of an isolated area is at Js=0.4655 nA for this set of parameter values, setting Jmax below that value implies that all areas in the network are monostable in isolation. In this situation, any persistent activity displayed by the model will be a consequence of a global, cooperative effect due to inter-areal interactions. On the other hand, having Jmax above the bifurcation point means that some areas will be multistable when isolated, e.g. they will be intrinsically multistable and compatible with classical WM theories.
Unless specified otherwise, we assume a range of Jmin=0.21 nA and Jmax=0.44 nA (i.e. below the critical value), so that the model displays distributed WM.

Inter-areal projections
We now consider the inter-areal projections connecting isolated areas to form the large-scale cortical network. Assuming that inter-areal projections stem only from excitatory neurons (as inhibitory projections tend to be local in real circuits) and that such projections are selective for excitatory neurons, the network or long-range input term arriving at each of the populations of a given area y from all other cortical areas is given by Here, a superindex denotes the cortical area and a subindex the particular population within each area. The sum in all equations runs over all cortical areas of the network (N=30). Excitatory populations A and B receive long-range inputs from equally selective units from other areas, while inhibitory populations receive inputs from both excitatory populations. Therefore, neurons in population A of a given area may be influenced by A-selective neurons of other areas directly, and by B-selective neurons of other areas indirectly, via local interneurons.
G is the global coupling strength, which controls the overall long-range projection strength in the network (G=0.48 unless specified otherwise). Z is a factor that takes into account the relative balance between long-range excitatory and inhibitory projections. Setting Z=1 means that both excitatory and inhibitory long-range projections are equally strong, but this does not guarantee that their effect is balanced in the target area, due to the effect of local connections. Following Murray et al., (2017b), we choose to impose a balance condition that guarantees that, if populations A and B have the same activity level, their net effect on other areas will be zero -therefore highlighting the selectivity aspect of the circuits. Considering that the transfer function of inhibitory populations is linear and their approximately linear rate-conductance relationship, it can be shown that Aside from global scaling factors, the effect of long-range projections from population x to population y is influenced by two factors. The first one, W xy , is the anatomical projection strength as revealed by tract-tracing data from Markov et al. (2013). We use the fraction of labelled neurons (FLN) from population x to y to constrain our projections values to anatomical data. We rescale these strengths to translate the broad range of FLN values (over five orders of magnitude) to a range more suitable for our firing rate models. We use a rescaling that maintains the proportions between projection strengths, and therefore the anatomical information, that reads Here, the values of the rescaling are k1 =1.2 and k2 =0.3. The same qualitative behavior can be obtained from the model if other parameter values, or other rescaling functions, are used as long as the network is set into a standard working regime (i.e. signals propagate across areas, global synchronization is avoided, etc). FLN values are also normalized so that ∑ = 1. In addition, and as done for the local connections, we introduce a gradient of long-range projection strengths using the spine count data: → , where λx is one for the target area x with the maximal spine count, and decreases for other areas with the same slope as the gradient of the local connectivity presented above.
The second factor that needs to be taken into account is the directionality of signal propagation across the hierarchy. Feedforward (FF) projections that are preferentially excitatory constitute a reasonable assumption which facilitate signal transmission from sensory to higher areas. On the other hand, having feedback (FB) projections with a preferential inhibitory nature contributes to the emergence of realistic distributed WM patterns (Fig. 2d,e) (see also Markov et al., 2014;Tsushima et al., 2006). This feature can be introduced, in a gradual manner, by linking the different inter-areal projections with the SLN data, which provides a proxy for the FF/FB nature of a projection (SLN=1 means purely FF, and SLN=0 means purely FB). In the model, we assume a linear dependence with SNL for projections to excitatory populations and with (1-SLN) for projections to inhibitory populations, as shown above.
We limit the modulation of the FB projections between frontal areas to a maximum of 0.25 so that interactions between frontal areas are never strongly inhibitory, in agreement with evidence of frontal networks having strong excitatory loops (Markowitz et al., 2015). This consideration is important to allow FEF (areas 8l and 8m) to exhibit some level of persistent activity during distributed WM -as their hierarchical position and recurrent strength are not strong enough to sustain activity otherwisebut it does not affect the behavior of our model otherwise.

Data analysis
We developed a numerical method to estimate the number of stable distributed WM attractors for a particular set of parameters values of our large-scale model. This method is used to obtain the results shown in Figs. 3 and 4. To allow for a cleaner estimation, we do not consider noise in the neural dynamics during these simulations.
Our large-scale cortical model has 30 areas, with each of them having two selective excitatory populations A and B. Simply assuming that each of the areas can reach one of three possible states (persistent activity in A, persistent activity in B, or spontaneous activity) means that our model can potentially display up to 3 30 attractor combinations. This number can be even larger if we refine the firing rate reached by each area rather than simply its persistent/non-persistent activity status. Since it is not possible to fully explore this extremely large number of possible attractors, we devised a strategy based on the exploration of a sample of the input space of the model. The core idea is to stimulate the model with a certain input pattern (targeting randomized areas) and registering the fixed point that the dynamics of the model converges to. By repeating this process with a large number of input combinations and later counting the number of different attractors from the obtained pool of fixed points, we can obtain an estimate of the number of attractors for a particular set of parameter values.

Stimulation protocol
A given input pattern is defined as a current pulse of fixed strength (Ipulse =0.2) and duration (Tpulse =1 sec) which reaches a certain number P of cortical areas. Only one population (A or B, randomized) in each area receives the input, and the P cortical areas receiving the input are randomly selected across the top 16 areas of the spine count gradient. This decreases the amount of potential input combinations we have to deal with by acknowledging that areas with stronger recurrent connections (such as 9/46d) are more likely to be involved in distributed WM patterns than those with weaker connections (such as MT). P can take any value between one and Pmax =16, and we run a certain number of trials (see below) for each of them. Different values of Ipulse and Tpulse, as well as setting the randomly selected areas at a high rate initial condition instead of providing an external input, have been also explored and lead to qualitatively similar results.
It is also important to consider that not all values of P have the same number of input combinations. For example, P=1 allows for 16*2=32 different input combinations (if we discriminate between populations A and B), while P=2 allows for 16*(16-1)*2=480 input combinations, and so on. For a given value of P, the number of possible input combinations Nc is given by By summing all values of Nc for P=1, …Pmax, we obtain around 43 million input combinations, which are still too many trials to simulate for a single model configuration. To simplify this further, we consider a scaling factor Fc on top of Nc to bring down these numbers to reasonable levels for simulations. We use Fc =0.0002 (or 0.02% of all possible combinations) for our calculations, which brings down the total number of simulated input combinations to around 9000. Other options, such as decreasing Pmax and using a larger scaling factor (Pmax =12, Fc =0.01 or 1% or all possible combinations) give also good results. Since the rescaling can have a strong impact for small P (yielding a number of trials smaller than one), we ensure at least one trial for these cases.
To guarantee the stability of the fixed points obtained during these simulations, we simulate the system during a time windows of 30 seconds (which is much larger than any other time scale in the system), and check that the firing rates have not fluctuated during the last 10 seconds before we register the final state of the system as a fixed point.

Estimating the number of attractors
The final step is to count how many different attractors have been reached by the system, by analyzing the pool of fixed points obtained from simulations. A simple way to do this is to consider that, for any fixed point, the state of each area can be classified as persistent activity in population A (i.e. mean firing rate above a certain threshold of 10 spikes/s), persistent activity in population B, or spontaneous activity (both A and B are below 10 spikes/s). This turns each fixed point into a vector of 30 discrete states, and the number of unique vectors among the pool of fixed points can be quickly obtained using standard numerical routines in Matlab (such as the 'unique' function).
A more refined way to count the number of attractors, which we use in this work, is to define an Euclidean distance to discriminate between an attractor candidate and any previously identified attractors. Once the first attractor (i.e. the first fixed point analyzed) is identified, we test whether the next fixed point is the same than the first one by computing the Euclidean distance Ed between them: where n=30 is the total number of areas in the network (only one of the populations, A or B, needs to be considered here). If Ed is larger than a certain threshold distance ε, we consider it a new attractor. We choose ε=0.01, which grossly means that two fixed points are considered as different attractors if, for example, the activity of one of their cortical areas differs by 0.5 spikes/s and the activity on all other areas is the same for both. The particular value of ε does not have a strong impact on the results (aside from the fact that smaller values of ε gives us more resolution to find attractors). When several attractors are identified, each new candidate is compared to all of them using the same method.
Both the first and the second method to count attractors deliver qualitatively similar results (in terms of the dependence of the number of attractors with model parameters), although as expected the second method yields larger numbers due to its higher discriminability.

Code availability
The code of the present model will be released upon publication of this manuscript.

Data availability
All data used in this manuscript is already available at core-nets.org or from the cited literature.

Extended references
Extended Data Figure 3: Behavior of all areas in the network during the visual WM task. Left: spatial maps of the simulated macaque brain during stimulation and delay period, with activity color coded. Right: evolution of the firing rate of all areas in the network. Stimulation occurs in V1 at t=4 seconds and has a duration of 500ms. Figure 5