Independent Activity Subspaces for Working Memory and Motor Preparation in the Lateral Prefrontal Cortex

The lateral prefrontal cortex is involved in the integration of multiple types of information, including working memory and motor preparation. However, it is not known how downstream regions can extract one type of information without interference from the others present in the network. Here we show that the lateral prefrontal cortex contains two independent low-dimensional subspaces: one that encodes working memory information, and another that encodes motor preparation information. These subspaces capture all the information about the target in the delay periods, and the information in both subspaces is reduced in error trials. A single population of neurons with mixed selectivity forms both subspaces, but the information is kept largely independent from each other. A bump attractor model with divisive normalization replicates the properties of the neural data. These results have implications for the neural mechanisms of cognitive flexibility and capacity limitations.


Introduction 28
Complex flexible behaviors require the integration of multiple types of information, including 29 information about sensory properties, task rules, items held in memory, items being attended, 30 actions being planned, and rewards being expected, among others. A large proportion of neurons 31 in the lateral prefrontal cortex (LPFC) encode a mixture of two or more of these types of 32 information 1-5 . This mixed selectivity endows the LPFC with a high-dimensional representational 33 space 1 , but it also presents the challenge of understanding how downstream regions that receive 34 mixed-selective input from the LPFC can read out meaningful information. One possible solution 35 would be to have multiple low-dimensional information subspaces, embedded within the high-36 dimensional state space of LPFC, which could enable the independent readout of different types 37 of information with minimal interference from changes of information in other subspaces [6][7][8][9][10] . 38 Information subspaces have been identified in the medial frontal cortex 11 , lateral prefrontal cortex 7 39 and early visual areas 8 . However, no studies to date have explicitly tested whether two 40 independent information subspaces can coexist within a single biological neural network. Here, 41 Since the second subspace contained target information only after the distractor disappeared, 164 and motor preparation presumably began after the last sensory cue that reliably predicted the 165 timing of the Go cue (i.e. the offset of the distractor), we hypothesized that the second subspace 166 corresponded to a motor preparation subspace. Activity between the Go cue and the saccade 167 onset contained information about saccade execution (45% of LPFC neurons we recorded were 168 selective in the period between the Go cue and saccade onset, assessed using a one-way 169 ANOVA, P < 0.05). In order to test whether the second subspace corresponded to a motor 170 preparation subspace, we compared the results of the original decomposition using Delay 1 and 171 Delay 2 activities with a new decomposition using Delay 1 and pre-saccadic period activities (150 172 ms to 0 ms prior to saccade). If the second subspace corresponded to a motor preparation 173 subspace, we should observe similarities between the second component in both 174 decompositions. In the new decomposition, we obtained two components with relative vector 175 magnitudes similar to those found in the first decomposition (For Component 1', the vector 176 magnitude in the pre-saccade period was 70% of that in Delay 1, while for Component 2', the 177 vector magnitude in Delay 1 was 0% of that in the pre-saccade period, Supplementary Fig. 5). 178 We found that Component 1 and Component 1' were significantly correlated ( In an additional test of the hypothesis that the second subspace corresponded to a motor 185 preparation subspace, we examined the relationship between Component 2 and Component 2' at 186 the level of single cells. First, we identified cells with spatial tuning in both Delay 2 and the pre-187 sacade period (73 cells, two one-way ANOVAs, both P < 0.05). Then, for each cell, we measured 188 the correlation between the activity in Component 2 and Component 2' across different target 189 locations. We found that 47% of these neurons showed significant correlation (Pearson 9 correlation, P < 0.05), which exceeded the number expected by chance (Fig. 2b, left bar, P < 191 0.001, g = 10.82). As a control, we carried out the same analysis between Component 1 and 192 Component 2', and found no evidence of a higher number of correlated cells than expected by 193 chance (Fig. 2b, right bar, P > 0. 19, g = 1.51). Examples of neurons with significant correlation 194 are shown in Figure 2c. This result provided additional support to our hypothesis that the second 195 subspace corresponded to a motor preparation subspace. Thus, henceforth, we will refer to the 196 second subspace as the "motor preparation subspace". working memory or motor preparation selectivity within the LPFC, or (2) the same population of 213 distinguish between these two possible mechanisms, we projected the unit vector representing 13 inter-cluster distance was then normalized by the average intra-cluster distance for all clusters, 278 which was a measure of trial-by-trial variability in the population response. This inter-to-intra 279 cluster distance ratio was compared between projections of working memory activity into the 280 working memory subspace ( Fig. 4d: projMSub(M)), and projections of working memory and motor 281 preparation activity into the working memory subspace (Fig. 4d: projMSub(M+P)). A similar analysis 282 was carried out in the motor preparation subspace (Fig. 4e). We found a small decrease (7.1%) 283 of the inter-to-intra cluster distance when both working memory and motor preparation 284 components were projected into the subspaces, suggesting the existence of a small, but 285 significant interference between both subspaces (P < 0.001, g = 4.81 between projMSub(M) and 286 projMSub(M+P), P < 0.001, g = 6.63 between projPSub(P) and projPSub(M+P)). 287

288
The two subspaces accounted for all the target information in the full space. We identified 2 289 independent subspaces that accounted for working memory and motor preparation signals. 290 However, it was possible that there were other subspaces that also contained target location 291 information. If this was true, information about target location should be present in the subspace 292 that complemented the 6-dimensional working memory and the 6-dimensional motor preparation 293 subspaces (i.e. the null space that has 117 effective dimensions that explain 95% total variance). 294 In order to assess this possibility, we trained and tested a decoder on activity projected into the 295 null space. We found that the null space contained no information (Fig. 4c, projNull(M+P), P > 296 0.59, g = 0.01 compared to chance), which meant that the two subspaces captured all the 297 decodable information in the full space. This observation suggested that the 2 subspaces we 298 identified were the only ones that contained target information in the LPFC in our task. 299 300 Less information was found in error trials in both subspaces. In a subset of trials, the 301 animals maintained fixation until the Go cue, but failed to report the correct target location with a 302 saccade. These failures could be due to the animals reporting other locations, including the of the monitor. Classifiers trained on working memory activity of correct trials projected into the 305 working memory subspace, projMSub(M), were tested on working memory activity of error trials, 306 which was also projected into the working memory subspace, projMSub(ME). Decoding 307 performance was significantly reduced in error trials (Fig. 4a,) compared to correct trials (Fig. 4a, 308 P < 0.001, g = 14.8 between projMSub(M) and projMSub(ME)), suggesting that failures in memory 309 encoding occurred during error trials. A similar analysis on the motor preparation subspace 310 yielded equivalent results (Fig. 4b, P < 0.001, g = 14.8 between projPSub(P) and projPSub(PE)), 311 which were consistent with the fact that in error trials, saccades were made to different locations 312 than in correct trials. These results suggested that the subspaces we found could have been 313 used by the animals to perform the task. 314 projected into the working memory subspace; projMSub(M+P), decoding of the full space activity projected into the working memory subspace; projMSub(ME), decoding of the working memory 319 activity in error trials projected into the working memory subspace using a classifier built on 320 working memory activity in correct trials projected into the working memory subspace; b, P 321 stands for motor preparation activity. Same conventions as in a, but for motor preparation activity 322 and the motor preparation subspace. We verified that the drop in performance in error trials was 323 specific to the two subspaces, and not due to a non-specific increase in noise in the population 324 (see Methods). c, projNull(M+P), decoding of the full space activity projected into the null 325 subspace. d, Inter-to-Intra cluster ratio of working memory activity projected into the working 326 memory subspace (projMSub(M)), and of full space activity projected into the working memory 327 subspace (projMSub(M+P)). e, Same conventions as in d, but for motor preparation activity and 328 motor preparation subspace. 329 330 A bump attractor artificial neural network with divisive normalization recapitulated the 331 properties of LPFC activity. An unexpected observation in our results was that decoding 332 performance in the working memory subspace decreased in Delay 2 compared to Delay 1 (Fig.  333 1d). This decrease coincided with an increase of decoding performance in the motor preparation 334 subspace (Fig. 1e). A state space analysis revealed that the decrease of working memory 335 decoding performance was due to a decrease in the inter-to-intra cluster distance of working 336 memory signals in Delay 2, and the increase of motor preparation decoding performance was 337 due to an increase in the inter-to-intra cluster distance of motor preparation signals in Delay 2, 338 ( Supplementary Fig. 7). In addition, we noticed that the mean population firing rate did not 339 change between Delay 1, Delay 2 and pre-target fixation periods ( Supplementary Fig. 8). This 340 observation was consistent with a population normalization mechanism to maintain the mean 341 population firing rate at a constant level in the LPFC 12,13 . In order to assess whether a 342 normalization mechanism was responsible for the decrease of working memory information in 343 Delay 2, we built artificial neural network models with and without population normalization, and 344 compared their behavior with the LPFC data. Bump attractor models have been shown to replicate several properties of LPFC activity, 347 including code-morphing in full space, the existence of a stable subspace with stable working 348 memory information, and non-linear mixed selectivity of individual neurons 7,14,15 . Here, we created 349 a model that incorporated subsets of neurons that represented information in the working 350 memory and motor preparation subspaces (Fig. 5a), and looked at the effect of adding divisive 351 normalization to keep the mean population firing rate constant. We constrained the model to 352 utilize neurons with mixed selectivity (Fig. 3a) by matching the selectivity properties to those 353 found in the LPFC data ( Supplementary Fig. 9). We also used the same decorrelation method to 354 identify working memory and motor preparation subspaces from the responses in the model 355 ( Supplementary Fig. 10). In both models with and without normalization, a stable working 356 memory bump appeared in Delay 1 after a subset of neurons representing one of the working 357 memory locations was activated (shown in the bottom left of Fig. 5b and 5d, where the cross-358 temporal decoding performance in LP11 was 75.8 ± 6.9% and 77.0 ± 9.6% respectively). After a 359 second subset of neurons representing motor preparation were activated in Delay 2, the model 360 without normalization exhibited an elevated decoding performance in the full space (93.1 ± 3.5% 361 in LP22, Fig. 5b) due to the correlated location between working memory and motor preparation, 362 which increased the distances between the clusters in the state space (Fig. 5c). There was also 363 no reduction of performance in the working memory subspace in Delay 2 (76.5 ± 8.3% in LP11, 364 77.8 ± 8.1% in LP22, P > 0.86, g = 0.28), and the mean population activity increased from Delay 365 1 (1.2 ± 0.04 spikes/s) to Delay 2 (1.4 ± 0.06 spikes/s, P < 0.05, g = 4.04). These three results 366 were inconsistent with our observations from the neuronal data. On the other hand, in the model 367 with normalization, we found that the decoding performance in the full space remained the same 368 in both delay periods (LP22 -LP11 overlapped with 0, P > 0.78, g = 0.61, Fig. 5d), replicated the 369 reduction in information in the working memory subspace in Delay 2 (82.6 ± 9.4% in LP11, 56.1 ± 370 5.4% in LP22, P < 0.05, g = 3.42, Fig. 5e), and did not observe any changes in the mean 371 population firing rate (D1: 1.0 ± 0.04 spikes/s, D2: 1.0 ± 0.04 spikes/s, P > 0.9, g = 0.05). Finally, as expected, target information in the motor preparation subspace emerged in Delay 2 (Fig. 5f) Here we demonstrate that two independent subspaces coexist within the LPFC population. These 396 subspaces contain largely independent information about target location, and appear to encode 397 working memory and motor preparation information. We show that there is a small, but significant 398 interference of information when both subspaces encode information simultaneously, and during 399 error trials information in the working memory subspace is reduced. Assessment of response 400 properties of individual neurons revealed that a single population of neurons with mixed 401 selectivity generates both subspaces. Finally, we show that a bump attractor neural network 402 model with divisive normalization can capture all these properties described. Overall, our results 403 show that working memory and motor preparation subspaces coexist in a single neural network 404 within the LPFC. 405

406
It is important to minimize interference between different types of information. For example, a 407 visual area may read out working memory information 17,18 , while a premotor region may read out 408 motor preparation information from the LPFC 17,19,20 . If large interferences existed between 409 subspaces, the computations of downstream regions would be compromised. We found a small, 410 but significant interference between the subspaces, such that some working memory information 411 was reflected in the motor preparation subspace (and vice versa). It is not surprising that there is 412 some degree of interference, since the method we used to decompose the signals did not impose 413 a constraint to ensure maximal orthogonality between subspaces, and while the mutual 414 information was low, it was not zero. To assess whether imposing orthogonality between 415 subspaces was feasible, we attempted to decompose the signals by minimizing the inner product 416 rather than the mutual information. However, this method did not provide a unique solution for the 417 decomposition, and the best decoding performance and interference obtained by this method 418 was not significantly different from that obtained by the method of minimum mutual information 419 ( Supplementary Fig. 12). We also considered alternative methods to decompose the signal, but 420 these produced subspaces with larger interferences (Supplementary Figs. 13 and 14). The 421 interference we found suggests that under conditions that stress the working memory and motor 422 preparation systems (such as a task that requires the concurrent memorization of 4 targets and 423 preparation of 4 movements) a predictable bias should be observable for both the recalled target 424 locations and eventual movements. This prediction remains to be tested. 425 426 We also found an indirect way in which information in subspaces interfere with each other: 427 divisive normalization of population activity. This led to a decrease of working memory 428 information in Delay 2 once motor preparation information emerged. Divisive normalization, which 429 has been described before in the LPFC 12,13 , could be useful as an energy saving mechanism, 430 since it maintains the population activity at a low level when new information is added. A bump 431 attractor model with divisive normalization allowed us to replicate the properties of LPFC activity. 432 However, this model only provides a high-level support for this mechanism, in need of a 433 mechanistic implementation. 434

435
In this work, we derived two subspaces, and analyzed the benefits of decoding from those 436 subspaces, from data in which the memory location and the motor preparation location were 437 identical. However, there are situations where the LPFC is required to store multiple pieces of 438 information that are uncorrelated, e.g. if the animal has to remember the location and color of a can be extended to identify the relevant subspaces in tasks with uncorrelated information as well 441 (see Methods). We show that in tasks with uncorrelated information, decoding in the full space 442 could result in higher interference as compared to tasks with correlated information 443 followed the following steps. 24 hours prior to the surgery, the animals received a dose of 491 before the surgery. During surgery, the scalp was incised, and the muscles retracted to expose 494 the skull. A craniotomy was performed (~ 2x2 cm). The dura mater was cut and removed from the 495 craniotomy site. Arrays of electrodes were slowly lowered into the brain using a stereotaxic 496 manipulator. Once all the arrays were secured in place, the arrays' connectors were secured on 497 top of the skull using bone cement. A head-holder was also secured using bone cement. The 498 piece of bone removed during the craniotomy was repositioned to its original location and 499 secured in place using metal plates. The skin was sutured on top of the craniotomy site, and 500 stitched in place, avoiding any tension to ensure good healing of the wound. All surgeries were 501 conducted using aseptic techniques under general anesthesia (isofluorane 1 -1.5% for 502 maintenance). The depth of anesthesia was assessed by monitoring the heart rate and 503 movement of the animal, and the level of anesthesia was adjusted as necessary. Analgesics 504 were provided during post-surgical recovery, including a Fentanyl patch (12.5 mg/2.5 kg 24 hours 505 prior to surgery, and removed 48 hours after surgery), and Meloxicam (0.2 -0.3 mg/kg after the 506 removal of the Fentanyl patch). Animals were not euthanized at the end of the study. 507

Recording Techniques 508
Neural signals were initially acquired using a 128-channel and a 256-channel Plexon OmniPlex 509 system (Plexon Inc., TX, USA) with a sampling rate of 40 kHz. The wide-band signals were band-510 pass filtered between 300 to 3000 Hz. Following that, spikes were detected using an automated 511 (trial epochs and eye data) for data analysis, we generated strobe words denoting trial epochs 516 and performance (rewarded or failure) during the trial. These strobe words were generated on the the center of the screen. While continuing to fixate, the animal was presented with a target (a red 521 square) for 300 ms at any one of eight locations in a 3x3 grid. The center square of the 3x3 grid 522 contained the fixation spot and was not used. The presentation of the target was followed by a 523 delay of 1000 ms, during which the animal was expected to maintain fixation on the white circle at 524 the center. At the end of this delay, a distractor (a green square) was presented for 300 ms at any 525 one of the seven locations (other than where the target was presented). This was again followed 526 by a delay of 1,000 ms. The animal was then given a cue (the disappearance of the fixation spot) 527 at the end of the second delay to make a saccade towards the target location that was presented 528 earlier in the trial. Saccades to the target location within a latency of 150 ms and continued 529 fixation at the saccade location for 200 ms was considered a correct trial. An illustration of the 530 task is shown in Figure 1a principal components that explained at least 95% of the variance. In order to decode data in the 540 subspace, the PCA projection matrix described in the previous step was replaced by the matrix 541 specifying the desired subspace (working memory or motor preparation), and the resulting data in 542 the subspace would thus have 7 dimensions. 543 In this analysis, we used a pseudo-population of N = 226 neurons. For each trial condition (which 546 was one of seven possible target locations), we trial-averaged and time-averaged the neural 547 activity in Delay 1 (800 to 1,300 ms from target onset) and Delay 2 (2,000 to 2,500 ms from target 548 onset) to obtain two activity matrices of 226 x 7. We then normalized the two activity matrices to 549 the mean of the baseline by subtracting neural responses in the fixation period (300 ms before 550 target onset), and obtained activity matrices 1 ̅̅̅̅ and 2 ̅̅̅̅ of size 226 x 7, where each column 551 represented the change in population activity under one condition. 1 ̅̅̅̅ and 2 ̅̅̅̅ contained activity 552 with high correlation (Supplementary Fig. 1a,b). We looked for the decorrelated components ̅ 553 and ̅ (Supplementary Fig. 1c,d), which were also of size 226 x 7, such that 1 ̅̅̅̅ and 2 ̅̅̅̅ were the 554 linear combination of ̅ and ̅ with different mixing coefficients, which could be formulated as: 555 where and were scalars. We can rewrite ̅ and ̅ as: 558 and each pair of ( , ) will determine one pair of ( ̅ , ̅ ). We concatenated ̅ and ̅ , respectively, 561 across conditions, and evaluated the independence between ̅ and ̅ by computing the mutual 562 information between the two arrays of size 1582. We performed a parameter search for ( , ) in 563 the range of [−2, 2] with a step size of 0.01. The minimum mutual information obtained was 0.076 564 bits when = 0.12 and = 0.65 (Supplementary Fig. 1). The memory and preparation 565 subspaces were then defined by the bases of ̅ and ̅ . 566 567 A similar decorrelation was performed between activity in Delay 1 and the pre-saccadic period 568 (150 to 0 ms prior to saccade onset): 569 where ̅̅̅̅ was the activity in the pre-saccade period. Using the same approach as described 573 above, we obtained ′ ̅̅̅̅ and ̅ with a minimum mutual information of 0.086 when ' = 0.01 and 574 ' = 0.71 (Supplementary Fig. 6). 575 576 577

Angle between subspaces 578
The subspaces we were comparing had dimensions of 226 x 7 (Supplementary Fig. 5). In order 579 to calculate the angle between two subspaces, we first computed the inner product matrix of size 580 7 x 7, took the mean of the absolute values to get a scalar value, and lastly, converted this scalar 581 value to angle by using the inverse cosine function. In 226 dimensions, the angles between 2 582 random vectors were almost all compressed in the range from 80 to 90 degrees, so the angle by 583 itself did not provide a good indication of similarity between two subspaces with high dimensions. 584 In order to get an unbiased estimation of the similarity of two subspaces in high dimensions (i.e. 585 Mem and Stable in Suppl. Fig. 5), we first computed the angle between Mem and Stable, and 586 then computed the angle between Stable and a random subspace of the same dimension, and 587 lastly, computed the difference between the two angles. By repeating this process, we generated 588 a distribution of the pairwise difference between the two angles. If the 95th percentile of this 589 distribution was smaller than 0, we considered the angle between Mem and Stable to be 590 significantly smaller than chance. 591 592

Neural Network decoding 593
In theory, the maximum decodable information in the full state space cannot be lower than the full space and the subspaces for working memory and motor preparation, we used a neural 600 network classifier that can reduce the issue of overfitting by introducing a validation set and early 601 stopping in the training process. The data were split into non-overlapping training (50%), 602 validation (30%), and testing (20%)  where 1 and 2 were similar to 1 and 2, but from error trials. The decoder was trained and 624 validated on the data from correct trials in the subspace and tested on the data from error trials in 625 the same subspace. Although we interpreted the decrease in decoding performance in the two 626 subspaces in error trials to be evidence of the link between these subspaces and the behavior of 627 the animal, an alternative interpretation could be that there was a general increase in noise in the 628 population in error trials (perhaps due to factors like inattention), and this led to a non-specific 629 decrease in information in all subspaces, including the memory and preparation subspaces. 630 However, due to the lack of target information in the null space, this non-specific decrease in 631 information in the null space would not be observable because of the floor effect. In order to rule 632 this possibility out, we quantified the intra-cluster variance in the full space across locations for 633 correct and error trials in both Delays 1 and 2 (refer to Supplementary Figure 7). We found no 634 evidence supporting the fact that the intra-cluster variance in Delay 1 was higher in error trials 635 than in correct trials (P > 0.46, g = 0.85), and found the intra-cluster variance in error trials in 636 Delay 2 to actually be lower than correct trials (P < 0.01 , g = 6.6), presumably due to the effects 637 of divisive normalization. These results indicated that the drop in performance in the working 638 memory and motor preparation subspaces in error trials was not due to a non-specific increase in 639 noise, but were more likely due to the fact that the responses in error trials deviated significantly 640 from those in correct trials, resulting in lower information in the two subspaces. 641 642

Identification of null space 643
The memory and preparation subspaces both had dimension of 226 x 7 (i.e. 7 vectors in a 226 644 dimensional space). We used MATLAB function null() to find the complementary space of the 645 conjunctive subspace for both memory and preparation. The null spaces had rank of 212, and 646 were orthogonal to both working memory and motor preparation subspaces. 117 PCs in the null 647 space could explain 95% of the total variance after projection the full space activity into the null 648 space. Any information about the target location that was not captured by the two subspaces 649 would be captured by the null space. 650 We considered two bootstrapped distributions to be significantly different if the 95th percentile 653 range of the two distributions did not overlap. We also computed an estimated p-value for this 654 comparison using the following formula 28 , 655 where represents the number of overlapping data points between the two distributions and 657 represents the number of bootstraps. With this computation, and the = 1000 bootstraps we 658 used throughout the paper, two distributions with no overlap will result in a p-value < 0.001, and 659 two distributions with x% of overlap will result in a p-value ~ x/100. 660

661
In addition to the estimated p-value, we also computed the effect size of the comparison using a 662 measure known as Hedges' g, computed using the following formula 29 , 663 where was the population firing rate at time ; was the recurrent connection weight 724 between units; was the external input at time ; was the loading weight of the input signal 725 to the population; was a noise term selected from a normal distribution from 0 to 0.1 (not 726 including 0 or 0.1, denoted as N (0,0.1)); and ( ) was a divisive normalization function that 727 kept the mean population firing rate at the baseline level. We constructed the recurrent weight 728 matrix from eigendecomposition: 729 where was a random square matrix whose columns were the eigenvectors of , and  was 731 a diagonal matrix whose diagonal elements were the corresponding eigenvalues for each 732 eigenvector. The first 17 eigenvalues in  were 1 (thus there were 17 stable eigenvectors), while 733 the rest of the eigenvalues were randomly chosen between 0 and 1 using a uniform distribution. 734 In each simulation, we assigned 1 stable eigenvector for baseline activity (with entries selected 735 from a uniform distribution U(0,1)), 8 stable eigenvectors for working memory activity, and 8 736 stable eigenvectors for motor preparation activity (with entries selected from U(1,2)). In order to 737 ensure that decoding performance in Delay 1 and Delay 2 were the same, we imposed a positive 738 mean for the motor preparation activity, so that the incorporation of motor preparation in Delay 2 739 would elevate the population mean, and divisive normalization would reduce the mean activity of 740 both working memory and motor preparation information. Otherwise, if the motor preparation 741 activity had zero mean, there would be a significant increase of decoding performance in Delay 2. 742 In the input weight matrix, the input activity for working memory corresponded to the 8 working 743 memory eigenvectors, and the input activity for motor preparation corresponded to the 8 motor 744 preparation eigenvectors. The distractor inputs had the same input activity as did the target 745 inputs, but with a lower magnitude (0.2 compared to target). At the beginning of each trial, the 746 population started with baseline activity equal to the stable baseline eigenvector, then the input 747 that requires memorizing locations of items -1 out of 2 possible locations, and their colors -1 out 757 of 3 possible colors, which are uncorrelated). When each target location is associated with only 758 one stimulus color (similar to our working memory and motor preparation locations), the 759 incorporation of stimulus color information in Delay 2 would add only 1 out of 3 possible shifts 760 (representing the 3 possible stimulus colors) to the clusters representing target location 761 ( Supplementary Fig. 15a). However, when target location and stimulus color are uncorrelated 762 (each stimulus color is equally likely to appear in each target location), the incorporation of 763 stimulus color information could add any of the 3 possible shifts to the clusters representing 764 target location activity, leading to much more diffuse clusters (Supplementary Fig. 15b). In this 765 latter case, we propose a more general formulation to estimate the information subspaces for 766 target location and stimulus color. First, we group trials by target location and obtain the trial-767 averaged and time-averaged activity in Delay 1 ( 1 ̅̅̅ ). Next, we group trials by stimulus color and 768 obtain the trial-averaged and time-averaged activity in Delay 2 ( 2 ̅̅̅ ̅̅̅ as memory and preparation have the same grouping, and the expectations reduce to ̅ 786 and ̅ as there is one-to-one mapping). We would perform the same parameter search on ( , ) 787 that will give the least mutual information between ̅ and ̅ . The projections into the first and second subspaces exhibited different temporal profiles. b, We projected the full space activity from both Delay 1 and Delay 2 (time-averaged in each period) 820 into the two subspaces, and calculated the cumulative percent variance explained by the principal 821 components in each projection. In both subspaces, 6 PCs were needed to explain more than 822 95% of the variance.  independent variables of target locations (7 locations) and task epoch (Delay 1 and Delay 2) to 915 categorize cells as: 1) those with exclusive working memory selectivity (those with target 916 information in both Delay 1 and Delay 2, and with selectivity to target location and task epoch, but 917 no interaction, 27.6% of cells); and 2) those with mixed selectivity to target location and task 918 epoch (those with a significant main effect of target location and task epoch, as well as a 919 significant interaction between target location and task epoch, 43.9% of cells). Additionally, we 920 used two one-way ANOVAs of target location (one in Delay 1, and one in Delay 2) to categorize 921 cells as those with exclusive motor preparation selectivity (those with significant selectivity in 922 Delay 2, but not Delay 1, 28.6% of cells). Among the cells that exhibited selectivity in the delay 923 periods (98 cells), we estimated that 27.6% had exclusive working memory selectivity, 28.6% had 924 exclusive motor preparation selectivity, and 43.9% had mixed selectivity to both working memory generated 1000 stable memory subspaces, and computed the difference in angle between each 993 of the stable memory subspaces with Amem, and the difference in angle between the stable