Nominally non-responsive frontal and sensory cortical cells encode task-relevant variables via ensemble consensus-building

Neurons recorded in behaving animals often do not discernibly respond to sensory input and are not overtly task-modulated. These nominally non-responsive neurons are difficult to interpret and are typically neglected from analysis, confounding attempts to connect neural activity to perception and behavior. Here we describe a trial-by-trial, spike-timing-based algorithm to reveal the hidden coding capacities of these neurons in auditory and frontal cortex of behaving rats. Responsive and nominally non-responsive cells contained significant information about sensory stimuli and behavioral decisions, and network modeling indicated that nominally non-responsive cells are important for task performance. Sensory input was more accurately represented in frontal cortex than auditory cortex, via ensembles of nominally non-responsive cells coordinating the behavioral meaning of spike timings on correct but not error trials. This unbiased approach allows the contribution of all recorded neurons - particularly those without obvious task-modulation - to be assessed for behavioral relevance on single trials.

Neurons recorded in behaving animals often do not discernibly respond to sensory input and are not overtly task-modulated. These nominally non-responsive neurons are difficult to interpret and are typically neglected from analysis, confounding attempts to connect neural activity to perception and behavior. Here we describe a trial-by-trial, spike-timing-based algorithm to reveal the hidden coding capacities of these neurons in auditory and frontal cortex of behaving rats. Responsive and nominally non-responsive cells contained significant information about sensory stimuli and behavioral decisions, and network modeling indicated that nominally non-responsive cells are important for task performance. Sensory input was more accurately represented in frontal cortex than auditory cortex, via ensembles of nominally non-responsive cells coordinating the behavioral meaning of spike timings on correct but not error trials. This unbiased approach allows the contribution of all recorded neurons -particularly those without obvious task-modulation -to be assessed for behavioral relevance on single trials.
Spike trains recorded from the cerebral cortex of behaving animals can be complex, highly variable from trial-to-trial, and therefore challenging to interpret. A fraction of recorded cells typically exhibit trial-averaged responses with obvious task-related features and can be considered 'classically responsive', such as neurons with tonal frequency tuning in the auditory cortex or orientation tuning in the visual cortex. Another population of responsive cells are modulated by multiple task parameters ('mixed selectivity cells'), and have recently been shown to have computational advantages necessary for flexible behavior 1. . However, a substantial number of cells have variable responses that fail to demonstrate any obvious trial-averaged relationship to task parameters [2][3][4][5] . These 'nominally nonresponsive' neurons are especially prevalent in frontal cortical regions but can also be found throughout the brain, including primary sensory cortex [4][5][6] . These response categories are not fixed but can be dynamic, with some cells apparently becoming nominally nonresponsive during task engagement without impairing behavioral performance [7][8][9] . The potential contribution of these cells to behavior remains to a large extent unknown and represents a major conceptual challenge to the field 2 .
How do these nominally non-responsive cells relate to behavioral task variables on single trials? While there are sophisticated approaches for dissecting the precise correlations between responsive cells and task structure 3,4,10-12 there is still a need for complementary and straightforward analytical tools for understanding any and all activity patterns encountered 1,3,4 . Moreover, most behavioral tasks produce dynamic activity patterns throughout multiple neural circuits, but we lack unified methods to compare activity across different regions, and to determine to what extent these neurons might individually or collectively perform task-relevant computations. To address these limitations, we devised a novel trial-to-trial analysis using Bayesian inference that evaluates the extent to which single-unit responses and ensembles encode behavioral task variables.

Nominally non-responsive cells prevalent in auditory and frontal cortex during behavior
We trained 15 rats on an audiomotor frequency recognition go/no-go task 8,[13][14][15] that required them to nose poke to a single target tone for food reward and withhold from responding to other non-target tones (Fig. 1a). Tones were 100 msec in duration presented sequentially once every 5-8 seconds at 70 dB sound pressure level (SPL); the target tone was 4 kHz and non-target tones ranged from 0.5-32 kHz separated by one octave intervals.
After a few weeks of training, rats had high hit rates to target tones and low false alarm rates to non-targets, leading to high d' values (mean performance shown in Fig. 1b; each individual rat included in this study shown in Supplementary Fig. 1).
To correctly perform this task, animals must first recognize the stimulus and then execute an appropriate motor response. We hypothesized that two brain regions important for this behavior are the auditory cortex (AC) and frontal cortical area 2 (FR2). Many but not all auditory cortical neurons respond to pure tones with reliable, short-latency phasic responses 6,[16][17][18][19][20][21] . These neurons can process sound in a dynamic and context-sensitive manner, and AC cells are also modulated by expectation, attention, and reward structure, strongly suggesting that AC responses are important for auditory perception and cognition 4,[21][22][23][24] . Previously we found that the go/no-go tone recognition task used here is sensitive to AC neuromodulation and plasticity 13 . In contrast, FR2 is not thought to be part of the canonical central auditory pathway, but is connected to many other cortical regions including AC 25,26 . This region has recently been shown to be involved in orienting responses, categorization of perceptual stimuli, and in suppressing AC responses during movement 10,25,27 . These characteristics suggest that FR2 may be important for goaloriented behavior.
We first asked if activity in AC or FR2 is required for animals to successfully perform this audiomotor task. We implanted cannulas into AC or FR2 (Supplementary Fig. 2), and infused the GABA agonist muscimol bilaterally into AC or FR2, to inactivate either region prior to testing behavioral performance. We found that task performance was impaired if either of these regions was inactivated, although general motor functions, including motivation or ability to feed were not impaired (Supplementary Fig. 3; for AC p=0.03; for FR2 p=0.009 Student's paired two-tailed t-test). Thus activity in both AC and FR2 may be important, perhaps in different ways, for successful task performance.
Once animals reached behavioral criteria (hit rates ≥70% and d' values ≥1.5), they were implanted with tetrode arrays in either AC or FR2 (Supplementary Fig. 4). After recovery, we made single-unit recordings from individual neurons or small ensembles of 2-8 cells during task performance. The trial-averaged responses of some cells exhibited obvious task-related features: neuronal activity was tone-modulated compared to inter-trial baseline activity (Fig. 1c) or gradually increased over the course of the trial as measured by a ramping index ( Fig. 1d; hereafter referred to as 'ramping activity'). However, 60% of recorded cells were nominally non-responsive in that they were neither tone modulated nor ramping according to statistical criteria (Fig. 1e,f; Supplementary Fig. 5; 64/103 AC cells and 43/74 FR2 cells from 15 animals had neither significant tone-modulated activity or ramping activity; pre and post-stimulus mean activity compared via bootstrapping and considered significant when p<0.05; ramping activity measured with linear regression and considered significant via bootstrapping when p<0.05 and r>0.5; for overall population statistics see Supplementary Fig. 6).

Novel single-trial, ISI-based algorithm for decoding non-responsive activity
Given that the majority of our recordings were from nominally non-responsive cells, we developed a general method for interpreting neural responses even when trial-averaged responses were not obviously task-modulated which allowed us to compare coding schemes across different brain regions (here, AC and FR2). The algorithm is agnostic to the putative function of neurons as well as the task variable of interest (here, stimulus category or behavioral choice).
Our algorithm uses the interspike intervals (ISIs) of individual neurons to decode the stimulus category (target or non-target) or behavioral choice (go or no-go) on each trial. In principle, any response property could be used with our method; however, we chose the ISI because its distribution could vary between task conditions even without changes in the firing rate building on previous work demonstrating that the ISI distribution contains complementary information to the firing rate [28][29][30] . The distinction between the ISI distribution and trial-averaged firing rate is subtle, yet important. While the ISI is obviously closely related to the instantaneous firing rate, decoding with the ISI distribution is not simply a proxy for using the time-varying, trial-averaged rate. To demonstrate this we constructed three model cells: a stimulus-evoked cell with distinct target and non-target ISI distributions (Fig. 2a), a stimulus-evoked cell with identical ISI distributions (Fig. 2b), and a nominally non-responsive cell with distinct target and non-target ISI distributions (Fig.   2c). These models clearly demonstrate that trial-averaged rate modulation can occur with or without corresponding differences in the ISI distributions and cells without apparent trial-averaged rate-modulation can nevertheless have distinct ISI distributions. Taken together, these examples demonstrate that the ISI distribution and trial-averaged firing rate capture different spike train statistics. This has important implications for decoding nonresponsive cells that by definition do not exhibit large firing rate modulations but nevertheless may contain information hidden in their ISI distributions.
For each recorded neuron, we built a library of ISIs observed during target trials and a library for non-target trials from a set of 'training trials'. Two different cells from AC are shown in Fig. 3 and Supplementary Fig. 7a-d, and another cell from FR2 is shown in Supplementary Fig. 7e-h. These libraries were used to infer the probability of observing an ISI during a particular trial type (Fig. 3b,c; Supplementary Fig. 7c,g; left panels show target in red and non-target in blue). These conditional probabilities were inferred using non-parametric statistical methods to minimize assumptions about the underlying process generating the ISI distribution and better capture the heterogeneity of the observed ISI distributions ( Fig. 3b; Supplementary Fig.7c,g). We verified that our observed distributions were better modeled by non-parametric methods rather than standard parametric methods (e.g. rate-modulated Poisson process; Supplementary Fig. 8).
Specifically, we found the distributions using Kernel Density Estimation where the kernel bandwidth for each distribution was set using 10-fold cross-validation. We then used these training set probability functions to decode a spike train from a previously unexamined individual trial from the set of remaining 'test trials'. This process was repeated 124 times using 10-fold cross-validation with randomly generated folds.
Importantly, while the probabilities of observing particular ISIs on target and non-target trials were similar ( Fig. 3b; Supplementary Fig. 7c,g), small differences between the curves carried sufficient information to allow for decoding. To characterize these differences, we used the weighted log likelihood ratio (W. LLR; Fig. 3c; Supplementary   Fig. 7c,g) to clearly represent which ISIs suggested target (W. LLR >0) or non-target (W. LLR <0) stimulus categories. Our algorithm relies only on statistical differences between task conditions; therefore, the W. LLR summarizes all spike timing information necessary for decoding. Similar ISI libraries were also computed for behavioral choice categories (Fig. 3b,c; Supplementary Fig. 7c,g; right panels show go decision in green and no-go in purple). These examples clearly illustrate that the relationship between the ISIs and task variables can be non-monotonic: in the cell shown in Fig. 3, short ISIs (ISI <50 msec) indicated non-target, medium ISIs (50 msec < ISI < 100 msec) indicated target, and longer ISIs indicated non-target (100 msec < ISI). This non-monotonic relationship would be impossible to capture with a simple ISI or firing rate threshold.
The algorithm uses the statistical prevalence of certain ISI values under particular task conditions (in this case the ISIs accompanying stimulus category or behavioral choice), to infer the task condition for each trial. Each trial begins with equally uncertain probabilities about the stimulus categories (i.e., p(target) = p(non-target) = 50%). As each ISI is observed sequentially within the trial, the algorithm applies Bayes' rule to update p(target|ISI) and p(non-target|ISI) using the likelihood of the ISI under each stimulus category (p(ISI|target) and p(ISI|non-target) ( Fig. 3b-d). As shown for one trial of the example cell in Fig. 3d, ISIs observed between 0-1.0 seconds consistently suggested the presence of the target tone, whereas ISIs observed between 1.0-1.4 seconds suggested the non-target category thereby also necessarily reducing the belief that a target tone was played (Fig. 3d, top trace). After this process was completed for all ISIs in the particular trial, we obtained the probability of a non-target tone and a target tone as a function of time during the trial (Fig. 3d). The prediction for the entire trial p(target|ISI) is evaluated at the end of the trial (in the example trial, p(target|ISI) = 61% ; Fig 3d). This process is repeated for the behavioral choice (Fig 3b-d; right panels; trials separated according to go, no-go; probabilities of ISIs in each condition generated; conditional probabilities used as likelihood function to predict behavioral choice on a given trial). The single-trial decoding performance of each neuron is then averaged over all trials as a measure of the overall ability of each neuron to distinguish behavioral conditions (Fig. 4a).

Nominally non-responsive cells reveal hidden task information
Can we uncover hidden task information from nominally non-responsive cells? We found that nominally non-responsive cells in both AC and FR2 provided significant information about each task variable (Fig. 4a,b, red). The ability to decode was independent of average firing rate (Supplementary Fig. 9a-f, 0.30 < r < 0.46), z-score (Supplementary Fig. 9g-i, -0.05 < r < 0.05), and ramping activity (Supplementary Fig. 9j, -0.02 < r < 0.28).
Stimulus decoding performance was also independent of receptive field properties including best frequency and tuning curve bandwidth for AC neurons ( Supplementary   Fig. 10).
We also observed that task information was distributed across both AC and FR2, and neural spike trains from individual units were multiplexed in that they often encoded information about both stimulus category and choice simultaneously (Fig. 4b). To establish that multiplexing was not simply a byproduct of the correlation between stimulus and choice variables, an independent measure of multiplexing relying on multiple regression was applied ( Supplementary Fig. 11). Despite the broad sharing of information about behavioral conditions, there were notable systematic differences between AC and FR2.
Surprisingly, neurons in FR2 were more informative about stimulus category than AC, and AC neurons were more informative about choice than stimulus category (Fig. 4a, pAC=0.016, pstim=0.0013, Mann-Whitney U test, two-sided). Both of these observations would not have been detected at the level of the PSTH, as most cells in AC were nonresponsive for behavioral choice (no ramping activity, 91/103), yet our decoder revealed that these same cells were as informative as choice responsive cells (Fig. 4c, p=0.32 Mann-Whitney U test, two-sided; red circles indicate cells non-responsive for both variables, dark-red cells are choice non-responsive, and black cells are responsive). Similarly, most cells in FR2 were sensory non-responsive (not tone modulated, 60/74), yet contained comparable stimulus information to sensory responsive cells (Fig. 4d, p=0.29 Mann-Whitney U test, two-sided; red cells are non-responsive for both variables, dark-red cells are sensory non-responsive, black cells are responsive).
To assess the statistical significance of these results, we tested our algorithm on two shuffled data sets. First, we ran our analysis using synthetically-generated trials that preserved trial length but randomly sampled ISIs with replacement from those observed during a session without regard to condition (Fig. 4e). Second, we left trial activity intact, but permuted the stimulus category and choice for each trial (Fig. 4f). We restricted analysis to cells with decoding performance significantly different from synthetic spike trains (all cells in To directly assess the extent to which information captured by the ISI distributions in our data set was distinct from the time-varying rate, we compared the performance from our ISI-based decoder to a conventional rate-modulated (inhomogeneous) Poisson decoder 31 which assumes that spikes are produced randomly with an instantaneous probability equal to the time-varying firing rate. As our model cells illustrate (Fig. 2), it is possible to decode using the ISI distributions even when firing rates are uninformative (Supplementary Fig.   12a). When applied to our dataset, the ISI-based decoder generally outperformed this conventional rate-based decoder confirming that ISIs capture information distinct from that of the firing rate (Supplementary Fig. 12b; Overall stimulus decoding performance: pAC=0.0001, pFR2=8×10 -6 ; Overall choice decoding performance pAC=0.0057, pFR2=0.02, Mann-Whitney U test, two-sided). Moreover, comparing single trial decoding outcomes demonstrated weak to no correlations between the ISI-based decoder and the conventional rate decoder, further underscoring that these two methods rely on different features of the spike train to decode (Supplementary Fig. 12c; stimulus medians: AC=0.10 FR2=0.11; choice medians: AC=0.07, FR2=0.08).
We hypothesize that ISI-based decoding is biologically plausible. Short-term synaptic plasticity and synaptic integration provide powerful mechanisms for differential and specific spike-timing-based coding. We illustrated this capacity by making whole-cell recordings from AC neurons in vivo and in brain slices (Supplementary Fig. 13a,b), as well as in FR2 brain slices (Supplementary Fig. 13c). In each case, different cells could have distinct response profiles to the same input pattern, with similar overall rates but different spike timings.
Moreover, we note that this type of coding scheme requires few assumptions about implementation, and does not require additional separate integrative processes to compute rates or form generative models. Thus ISI-based decoding coding could be generally applicable across brain areas, as demonstrated here for AC and FR2.

task-switching paradigm
To further demonstrate the generalizability and utility of our approach, we applied our decoding algorithm to neurons that were found to be non-responsive in a previously published study 5 . In this study, rats were trained on a novel auditory stimulus selection task where depending on the context animals had to respond to one of two cues while ignoring the other. Rats were presented with two simultaneous sounds (a white noise burst and a warble). In the "localization" context the animal was trained to ignore the warble and respond to the location of the white noise burst and in the "pitch" context it was trained to ignore the location of the white noise burst and respond to the pitch of the warble (Fig. 5a).
The main finding of the study is that the pre-stimulus activity in both primary auditory cortex and prefrontal cortex encodes the selection rule (i.e. activity reflects whether the animal is in the localization or pitch context). This conclusion was entirely based on a difference in pre-stimulus firing rate between the two contexts. The authors reported, but did not further analyze, cells that did not modulate their pre-stimulus firing rate. In our nomenclature these cells are "non-responsive for the selection rule". Using our algorithm we found that the ISI distributions of these cells encoded the selection rule and were significantly more informative than the responsive cells (Fig. 5b, pAC=5×10 -6 , pPFC<0.0002, Mann-Whitney U test, two-sided). This surprising result demonstrates that our algorithm generalizes to novel datasets, and may be used to uncover coding for cognitive variables that are hidden from conventional trial-averaged analyses.
Furthermore, these results indicate that as task complexity increases nominally nonresponsive cells are differentially recruited for successful task execution.

Nominally non-responsive ensembles are better predictors of behavioral errors
Downstream brain regions must integrate the activity of many neurons and this ISI-based approach naturally extends to simultaneously recorded ensembles. We therefore asked whether using small ensembles would change or improve decoding. To decode from ensembles, likelihood functions from each cell were calculated independently as before, but were used to simultaneously update the task condition probabilities (p(target | ISI) and p(go | ISI)) on each trial (Fig. 6a). Analyzing ensembles of 2-8 neurons in AC and FR2 significantly improved decoding for both variables in FR2 and stimulus decoding in AC ( Fig. 6b, pAC stim=0.04, pFR2 stim=1×10 -5 , pAC=0.29, pFR2 choice=7×10 -5 , Mann-Whitney U test, two-sided). This was not a trivial consequence of using more neurons, as the information provided by individual ISIs on single trials can be contradictory (e.g., compare LLR functions in Fig. 3c and Supplementary Fig. 7c for 50 ms < ISIs < 120 ms). For ensemble decoding to improve upon single neuron decoding, the ISIs of each member of the ensemble must indicate the same task variable.
Can our decoding method predict errors on a trial-by-trial basis? In general, trial-averaged PSTHs did not reveal systematic differences between correct and error trials. However, when we examined single-trial performance with our algorithm, ensembles of neurons in AC and FR2 predicted behavioral errors (Fig. 6c). In general, ensembles in AC predicted behavioral errors significantly better than those in FR2 (Fig. 6c, for 3-member ensembles: p=1.2×10 -5 , for 4-member ensembles: p=0.03, Mann-Whitney U test, two-sided).

Recurrent neural network model demonstrates nominally non-responsive cells are necessary for task performance and synergistically interact with responsive cells
Our data indicate that nominally non-responsive activity encodes hidden task-information and can predict behavioral errors, but how does it impact task performance? Examining this question requires network modeling to further explore the dynamics of responsive and non-responsive cells. We carried out a series of simulated perturbation experiments on a recurrent neural network trained to perform our frequency recognition task ( Fig. 7a,b).
Inactivation of both responsive and non-responsive units impaired task performance ( Fig.   7c) indicating that both sub-populations are necessary to complete the task. Surprisingly, inactivation of a random subset of units (including both responsive and non-responsive units) resulted in larger performance decreases than what would be predicted from the psychometric inactivation curves of responsive and non-responsive units treated independently (Fig. 7d). This finding suggests that responsive and non-responsive units have a synergistic effect on overall task performance and that both sub-populations should be considered in concert to fully understand behavioral performance.

Ensemble consensus-building dynamics underlie hidden task information
While improvements were seen in decoding performance with increasing ensemble size, the ISI distributions/ISI-based likelihood functions were highly variable across individual ensemble members. Thus, we wondered if there was hidden task-related structure in the population activity that evolved over the course of the trial to instantiate behavior. To answer this question, we examined whether local ensembles share the same representation of task variables over the course of the trial. Do they "reach consensus" on how to represent task variables using the ISI (Fig. 8a)? Without consensus, a downstream area would need to interpret ensemble activity using multiple disparate representations rather than one unified code (Fig. 8b). The firing rates and ISI distributions of simultaneously-recorded units were generally variable across cells requiring an exploratory approach to answer this question (Fig. 8c, example three-member ensemble with heterogeneous conditional ISI distributions). Therefore, we examined changes in the distributions of ISIs across task conditions, asking how the moment-to-moment changes in the log-likelihood ratio (LLR) of each cell were coordinated to encode task variables (Fig. 8c). We focused on the LLR because it quantifies how the ISI represents task variables for a given cell and summarizes all spike timing information needed by our algorithm (or a hypothetical downstream cell) to decode.
We examined how ensembles coordinate their activity moment-to-moment over the course of the trial by quantifying the similarity of the LLRs across cells in a sliding window.
Similarity was assessed by summing the LLRs of ensemble members, calculating the total area underneath the resulting curve, and normalizing this value by the sum of the areas of each individual LLR. We refer to this quantified similarity as 'consensus'; a high consensus value indicates that the LLRs generally agree on stimulus or choice while a low value indicates disagreement at that moment (Fig. 8d). We should emphasize that successful ensemble decoding (Fig. 6) does not require the LLRs of ensemble members to be related in any way; therefore, structured LLR dynamics (Fig. 8) are not simply a consequence of how our algorithm is constructed.
While the conventional trial-averaged PSTH of non-responsive ensembles recorded in AC and FR2 showed no task-related modulation, our analysis revealed structured temporal dynamics of the LLRs (captured by the consensus value). In FR2, sensory non-responsive ensembles (ensembles in which at least two out of three cells were not tone-modulated) encode stimulus information using temporally-precise stimulus-related dynamics on correct trials. The stimulus representation of sensory non-responsive ensembles reached consensus rapidly after stimulus onset followed by divergence (Fig. 8e, stimulus-aligned, These results reveal that consensus-building and divergence occur at key moments during the trial for successful execution of behavior in a manner that is invisible at the level of the PSTH. As sensory and choice non-responsive ensembles participated in these dynamics, changes in the consensus value cannot simply be a byproduct of correlated firing rate modulation due to tone-evoked responses or ramping. While consensus-building can only indicate a shared representation, divergence can indicate one of two things: (1) the LLRs of each cell within an ensemble are completely dissimilar or (2) they are 'out of phase' with one another -the LLRs partition the ISIs the same way (Fig. 8d, dotted lines), but the same ISIs code for opposite behavioral variables. This distinction is important because (2) implies coordinated structure of ensemble activity (the partitions of the ISI align) whereas (1) does not. To distinguish between these two possibilities we used the 'unsigned consensus', a second measure sensitive to the ISI partitions but insensitive to the sign of the LLR. Both 'in phase' and perfectly 'out of phase' LLRs would produce an unsigned consensus of 1 whereas unrelated LLRs would be closer to 0 (Fig. 8d). For example, in the second row of Figure 8d, both cells agree that ISIs < 100 ms indicate one stimulus category and ISIs > 100 ms indicate another, but they disagree about which set of ISIs mean target and which mean non-target. This results in a consensus value of 0 (out of phase) but an unsigned consensus value of 1.
Using this metric, we found that the unsigned consensus pattern for nominally nonresponsive ensembles (ensembles with two or more nominally non-responsive members) were shared between AC and FR2 -increasing until ~750 ms after tone onset on correct trials (Fig. 8g, stimulus-aligned, Dconsensus, t = 0 to 0.89 s, p = 1.7×10 -5 Wilcoxon test, two-sided). This intriguing observation reveals the timed coordination of nominally nonresponsive ensembles coincident across brain regions and suggests that these cells may constitute a distinct functional network separate from that of responsive cells. Nominally non-responsive ensembles in AC and FR2 also increased their unsigned consensus immediately before behavioral response (although values in AC were lower overall; Fig.   8g, response-aligned, Dconsensus, t = -1.0 to 0.0 s, p = 0.0011 Wilcoxon test, two-sided).
This pattern of consensus-building was only present on correct trials but not incorrect trials ( Fig. 8h, Dconsensus compared to error trials, p = 1.9×10 -9 Mann-Whitney U test, twosided) suggesting that behavioral errors might result from a general lack of consensus between ensemble members. In summary, we have shown that cells which appear unmodulated during behavior do not encode task information independently, but do so by synchronizing their representation of behavioral variables dynamically during the trial.

Discussion
Using a straightforward, single-trial, ISI decoding algorithm that makes few assumptions about the proper model for neural activity, we found task-specific information extensively represented by what appeared to be nominally non-responsive neurons in both AC and FR2. Furthermore, the degree to which single neurons were task-modulated was uncorrelated with conventional response properties including frequency tuning. AC and FR2 each represent both task-variables; furthermore, in both regions we identified many multiplexed neurons that simultaneously represented the sensory input and the upcoming behavioral choice including non-responsive cells. This highlights that the cortical circuits that generate behavior exist in a distributed network -blurring the traditional modular view of sensory and frontal cortical regions.
Most notably, FR2 has a better representation of task-relevant auditory stimuli than AC.
The prevalence of stimulus information in FR2 might be surprising given that AC reliably responds to pure tones in untrained animals; however, when tones take on behavioral significance, this information is encoded more robustly in frontal cortex, suggesting that this region is critical for identifying the appropriate sensory-motor association.
Furthermore, the stark improvement in stimulus encoding for small ensembles in FR2 suggests that task-relevant stimulus information is reflected more homogeneously in local firing activity across FR2 (perhaps through large scale ensemble consensus-building) while this information is reflected in a more complex and distributed manner throughout AC.
The finding that the ISI-based approach of our algorithm is not reducible to rate despite their close mathematical relationship raises the question of how downstream regions could respond preferentially to specific ISIs. Our whole cell recordings from both AC and FR2 demonstrate that different postsynaptic cells can respond differently to the same input pattern with a fixed overall rate emphasizing the importance of considering a code sensitive to precise spike-timing (Supplementary Fig. 13). Furthermore, this is supported by experimental and theoretical work showing that single neurons can act as resonators tuned to a certain periodicity of firing input 32 . This view could also be expanded to larger neuronal populations comprised of feedback loops that would resonate in response to particular ISIs. In this case, cholinergic neuromodulation could offer a mechanism for adjusting the sensitivities of such a network during behavior on short time-scales by providing rapid phasic signals 33 .
It is still unclear what the relevant timescales of decoding might be in relation to phenomena such as membrane time constants, periods of oscillatory activity, and behavioral timescales. Given that our ISI-based decoder and conventional rate-modulated decoders reveal distinct information, future approaches might hybridize these rate-based and temporal-based decoding methods to span multiple timescales.
We have also shown that underlying the task-relevant information encoded by each ensemble is a rich set of consensus-building dynamics that is invisible at the level of the PSTH. Ensembles in both FR2 and AC underwent stimulus and choice-related consensus building that was only observed when the animal correctly executed the task. Moreover, nominally non-responsive cells demonstrated temporal dynamics synchronized across regions which were distinct from responsive ensembles. This raises the possibility that nominally non-responsive ensembles constitute a discrete functional network distinguishable from responsive ensembles. These results underscore the importance of measuring neural activity in behaving animals and using unbiased and generally-applicable analytical methods, as the response properties of cortical neurons in a behavioral context become complex in ways that challenge our conventional assumptions [7][8][9]34 . Animals were rewarded with food for nose poking within 2.5 seconds of presentation of the target tone (4 kHz) and given a short 7-second time-out for incorrectly responding to non-target tones (0.5, 1, 2, 8, 16, 32 kHz). Incorrect responses include either failure to enter the nose port after target tone presentation (miss trials) or entering the nose port after non-target tone presentation (false alarms). Tones were 100 msec in duration and sound intensity was set to 70 dB SPL. Tones were presented randomly with equal probability such that each stimulus category was presented. The inter-trial interval delays used were 5, 6, 7, or 8 seconds.
For experiments involving muscimol, we implanted bilateral cannulas in either FR2 (+2.0 to +4.0 mm AP, ±1.3 mm ML from Bregma) of 7 animals or AC (-5.0 to -5.8 mm AP, 6.5-7.0 mm ML from Bregma) of 3 animals. We infused 1 µL of muscimol per side into FR2 or infused 2 µL of muscimol per side into AC, at a concentration of 1 mg/mL.
For saline controls, equivalent volumes of saline were infused in each region. Behavioral testing was performed 30-60 minutes after infusions. Power analysis was performed to determine sample size for statistical significance with a power of b: 0.8; these studies required at least 3 animals, satisfied in the experiments of Supplementary Fig. 3b,e. For motor control study, animals could freely nose poke for food reward without presentation of auditory stimuli after muscimol and saline infusion.
Stainless steel screws and dental cement were used to secure the microdrive to the skull, and one screw was used as ground. Each drive consisted of 8 independently adjustable tetrodes. The tetrodes were made by twisting and fusing four polyimide-coated nichrome wires (Sandvik Kanthal HP Reid Precision Fine Tetrode Wire; wire diameter 12.5 µm).
The tip of each tetrode was gold-plated to an impedance of 300-400 kOhms at 1 kHz (NanoZ, Neuralynx).

Electrophysiological recordings & unit isolation
Recordings in behaving rats were performed as previously described 8 . After the animal recovered from surgery (~7 days) recordings began once performance returned to presurgery levels. Tetrodes were advanced ~60 µm 12 hours prior to each recording session, to a maximum of 2.5mm (for FR2) or 2.0 mm (for AC) from the pial surface. For recording, signals were first amplified onboard using a small 16-bit unity-gain preamplifier array (CerePlex M, Blackrock Microsystems) before reaching the acquisition system. Spikes were sampled at 30 kS/sec and bandpass filtered between 250 Hz and 5 kHz. Data were digitized and all above-threshold events with signal to noise ratios > 3:1 were stored for offline spike sorting. Single-units were identified on each tetrode using OfflineSorter (Plexon Inc.) by manually classifying spikes projected as points in 2D or 3D feature space.
The parameters used for sorting included the waveforms projection onto the first two principal components, energy, and nonlinear energy. Artifacts were rejected based on refractory period violations (< 1 msec). Clustering quality was assessed based on the Isolation Distance and Lratio sorting quality metrics. To be initially included for analysis, cells had to have > 3 spikes per trial for 80% of trials to ensure that there were enough ISIs to reliably estimate the ISI probability density functions.

Statistical tests for non-responsiveness
We used two positive statistical tests for non-responsiveness: one to establish a lack of tone-modulation, the other to establish a lack of ramping activity. To accommodate the possibility of tone onset and offset responses, we performed our tone-modulation test on a 100 ms long tone presentation window as well as the 100 ms window immediately after tone presentation. The test compared the number of spikes during each of these windows to inter-trial baseline activity as measured by three sequential 100 ms windows preceding tone onset. Three windows were chosen to account for variability in spontaneous spike counts. Given that spike counts are discrete, bounded, and non-normal, we used bootstrapping to evaluate whether the mean change in spikes during tone presentation was sufficiently close to zero (in our case 0.1 spikes). We subsampled 90% of the spike count changes from baseline, calculated the mean of these values, and repeated this process 5000 times to construct an empirical distribution of means. If 95% of the subsampled means values were between -0.1 and 0.1 we considered the cell sensory non-responsive (p<0.05).
The value of 0.1 spikes was chosen to be conservative as it is equivalent to an expected change of 1 spike every 10 trials. This is a conservative, rigorous method for establishing sensory non-responsiveness that is commensurate with more standard approaches for establishing tone responsiveness such as the z-score.
To quantify the observed sustained increase in firing rate preceding the behavioral response a ramp index was calculated adapted from the 'build-up rate' used in previous literature 31 . First, the trial averaged firing rate was determined in 50 msec bins leading up to the behavioral response. We then calculated the slope of a linear regression in a 500 msec long sliding window beginning 850 msec before behavioral response. The maximum value of these slopes was used as the 'ramp index' for each cell. Cells were classified as choice non-responsive if the ramp index did not indicate an appreciable change in the firing rate (less than 50% change) established via bootstrapping. Cells that were shown to be both sensory and choice non-responsive were considered nominally non-responsive overall ( Fig. 4a,b, red circles).

Additional firing statistics
Spontaneous average firing rate was established by averaging spikes in a 100 msec time window immediately prior to tone onset on each trial. To quantify tone modulated responses observed during stimulus presentation, we calculated z-scores of changes in spike count from 100 msec before tone onset to 100 msec during tone presentation: = where is the mean change in spike count and is the standard deviation of the change in spike count.

Analysis of receptive field properties
Receptive fields were constructed by calculating the average change in firing rate from 50 ms before tone onset to 50 ms during tone presentation. The window used during tone presentation was identical to that used to calculate the z-score. Best frequency was defined as the frequency where the largest positive deviation in the evoked firing rate was observed.
Tuning curve bandwidth was determined by calculating the width of the tuning curve measured at the mean of the maximum and minimum observed evoked firing rates.

In vivo whole-cell recordings
Sprague-Dawley rats 3-5 months old were anesthetized with pentobarbital. Experiments were carried out in a sound-attenuating chamber. Series of pure tones (70 dB SPL, 0.5-32 kHz, 50 msec, 3 msec cosine on/off ramps, inter-tone intervals between 50-500 msec) were delivered in pseudo-random sequence. Primary AC location was determined by mapping multiunit responses 500-700 µm below the surface using tungsten electrodes. In vivo whole-cell voltage-clamp recordings were then obtained from neurons located 400-1100 In vitro whole-cell recordings Acute brain slices of AC or FR2 were prepared from 2-5 month old Sprague-Dawley rats.
Animals were deeply anesthetized with a 1:1 ketamine/xylazine cocktail and decapitated.

ISI-based single-trial Bayesian decoding
Our decoding method was motivated by the following principles: First, single-trial spike timing is one of the only variables available to downstream neurons. Any observations about trial-averaged activity must ultimately be useful for single-trial decoding, in order to have behavioral significance. Second, there may not be obvious structure in the trialaveraged activity to suggest how non-responsive cells participate in behaviorally-important computations. This consideration distinguishes our method from other approaches that rely explicitly or implicitly on the PSTH for interpretation or decoding 4,10,11,35-37 . Third, we required a unified approach capable of decoding from both responsive and non-responsive cells in sensory and frontal areas with potentially different response profiles. Fourth, our model should contain as few parameters as possible to account for all relevant behavioral variables (stimulus category and behavioral choice). This model-free approach also distinguishes our method from others that rely on parametric models of neural activity.
These requirements motivated our use of ISIs to characterize neuronal activity. For non-responsive cells with PSTHs that displayed no systematic changes over trials or between task conditions, the ISI distributions can be variable. The ISI defines spike timing relative to the previous spike and thus does not require reference to an external task variable such as tone onset or behavioral response. In modeling the distribution of ISIs, we use a non-parametric Kernel Density Estimator that avoids assumptions about whether or not firing occurs according to a Poisson (or another) parameterized distribution. We used maximum likelihood to estimate the bandwidth of the Kernel Density Estimator in a datadriven manner. Finally, the use of the ISI was also motivated by previous work demonstrating that the ISI can encode sensory information 28-30 and that precise spike timing has been shown to be important for sensory processing in rat auditory cortex 38,39 .
Training probabilistic model: Individual trials were defined as the time from stimulus onset to the response time of the animal (or average response time in the case of no-go trials). Trials were divided into four categories corresponding to each of the four possible variable combinations (target/go, target/no-go, non-target/go, non-target/no-go).
Approximately 90% of each category was set aside as a training set in order to determine the statistical relationship between the ISI and the two task variables (stimulus category, behavioral choice).
Each ISI observed was sorted into libraries according to the stimulus category and behavioral choice of the trial. The continuous probability distribution of finding a particular ISI given the task condition of interest (target or non-target, go or no-go) was then inferred using nonparametric Kernel Density Estimation with a Gaussian kernel of bandwidth set using a 10-fold cross-validation 40  p(target|ISI, t) = p(ISI|target, t)p(target, t) p(ISI|target, t)p(target, t) + p(ISI|non-target, t)p(non-target, t) On the left hand side are the updated beliefs about the probability of a target. When the next ISI is observed this value would be inserted as p(target, t) on the right side of the equation and updated once more. Using the probability normalization, p(non-target, t) can be determined, p(target, t) + p(non-target, t) = 1 Similarly, for choice, p(go|ISI, t) = p(ISI|go, t)p(go, t) p(ISI|go, t)p(go, t) + p(ISI|no-go, t)p(no-go, t) and p(go, t) + p(no-go, t) = 1 Continuing this process over the course of the trial, we obtain four probabilities -one for each of the variable outcomes -as a function of time during the trial: p(target, t), p(nontarget, t), p(go, t), and p(no-go, t). At each moment, the total probability of both stimuli and both choices are 1. The prediction for the entire trial was assessed at the end of the trial. For comparison to choice probability and latency decoding, the certainty was ignored and the preferred variable was taken as the prediction with 100% certainty.
The overall likelihood for a spike train is then simply equal to product of the likelihoods for each ISI observed over the course of the trial, We used 10-fold cross-validation, meaning the trials in the four stimulus categories were randomly divided into ten parts and each part took a turn acting as the test set with the remaining 90% of trials acting as a training set. To estimate the statistical certainty of these results we used bootstrapping with 124 repetitions (except in the case of the null hypotheses where 1240 repetitions were used).
Ensemble decoding: Ensemble decoding proceeded very similarly to the single-unit case. The ISI probability distributions for each neuron in the ensemble were calculated independently as described above. However, while decoding a given trial, the spike trains of all neurons in the ensemble were used to simultaneously update the beliefs about stimulus category and behavioral choice. In other words, p(stimulus, t) and p(choice, t) were shared for the entire ensemble but each neuron updated them independently using Bayes' rule whenever a new ISI was encountered. The joint likelihood of observing a set of ISIs during a trial is then the product of the likelihoods of each neuron independently. where pj is the likelihood of observing a given set of ISIs from neuron j.

Synthetic spike trains
To test the null hypothesis that the ISI-based single-trial Bayesian decoder performance was indistinguishable from chance, synthetic spike trains were constructed for each trial of a given unit by randomly sampling with replacement from the set of all observed ISIs regardless of the original task variable values (synthetic spike trains, Fig. 4e). In principle under this condition, ISIs should no longer bear any relationship to the task variables and decoding performance should be close to 50%. For single-unit responses, this randomization was completed 1240 times. Significance from the null was assessed by a direct comparison to the 124 bootstrapped values observed from the true data to the 1240 values observed under the null hypotheses. The p-value was determined as the probability of finding a value from this synthetic condition that produced better decoding performance than the values actually observed as in a standard permutation test.
As a secondary control, we used a traditional permutation test whereby observed spike trains were left intact, but the task variables that correspond to each spike train were randomly permuted (condition permutation, Fig. 4f). This process was completed 1240 times.

Rate-modulated Poisson decoding
To decode using the trial-averaged firing rate, we implemented a standard method 31 which uses the probability of observing a set of n spikes at times t1, … , tn assuming those spikes were generated by a rate-modulated Poisson process (Supplementary Fig. 12). First, we use a training set comprising 90% of trials to estimate the time-varying firing rate for each condition from the PSTH ( target ( ), non-target ( ), go ( ), no-go ( )) by Kernel Density Estimation with 10-fold cross-validation. The remaining 10% of spike trains are then decoded using the probability of observing each spike train on each condition assuming they were generated according to a rate-modulated Poisson process Collecting the first and last terms relating to trial start and trial end as With the exception of the terms relating to trial start and end, we can then view the likelihood of a spike train as resulting from the likelihood of the individual ISIs (just as with our ISI-decoder), with the key difference that these ISI probabilities are inferred from the firing rate rather than estimated directly using non-parametric methods.

Inferring the ISI distribution predicted by a rate-modulated Poisson process
To compare the ISI distribution inferred using non-parametric methods to one predicted by a rate-modulated Poisson process we use the relationship above to calculate the predicted probability of observing an ISI of given length within the 1 second window used for our non-parametric estimates. The formula above assumes a spike has already occurred at time t, so we multiply by the probability of observing a spike at time t, p( | target) = target ( ), to obtain the total probability of finding an ISI at any given point in the trial. In other words, the probability of observing an ISI beginning at time t is simply the probability of observing spikes at times t and t + ISI with silence in between.
The probability of observing an ISI at any time within a time window spanning wi to wf is simply the integral of this ISI probability as a function of time across the window.
To ensure the final spike occurs before wf the integral spans wi to (wf -ISI), where C is a normalization constant which ensures p(ISI | wi, wf, target) integrates to 1,

Regression based method for verifying multiplexing
For each cell, we fit a Logit model for both the stimulus and choice decoding probabilities on individual trials with the true stimulus category and behavioral choice as regressors. We then calculated the extent to which the stimulus decoding probability was determined by true stimulus category by subtracting the regression coefficient for stimulus from that of choice (Supplementary Fig. 11a, x-axis, stimulus selectivity index); when this number is positive it indicates that stimulus was a stronger predictor of stimulus decoding on a trialby-trial basis. The same process was repeated for choice ( Supplementary Fig. 11a, y-axis, choice selectivity index). According to this analysis we took multiplexed cells to be those that were positive for both measures (Supplementary Fig. 11a, orange symbols, 19/90 cells). In other words, multiplexed cells were cells for which stimulus decoding probabilities were primarily a result of true stimulus category and choice decoding probabilities were primarily a result of true behavioral choice.
Given the moderate negative correlation for these indices we projected each of these points onto their linear regression to create a one-dimensional regression-based uniplexing index. Cells with a value near zero are the multiplexed cells described above and cells with positive or negative values are primarily stimulus or choice selective (Supplementary Fig. 11a).
We compared the uniplexing values produced by this regression method to those produced by examining only the average decoding performance for stimulus and choice ( Supplementary Fig. 11b). A decoding-based uniplexing index was defined as the difference between average stimulus and choice decoding for each cell. When these two values are comparable this measure returns a value close to zero and the cell is considered multiplexed; moreover, cells that are uniplexed for stimulus or choice receive positive and negative values respectively just as with the regression based measure. While the overall magnitude of these two measures need not be related, both measures of multi/uniplexing rank cells on a one-dimensional axis from choice uniplexed to multiplexed to stimulus uniplexed centered on zero.

Weighted log likelihood ratio
The log likelihood ratio (LLR) was calculated by first calculating the conditional ISI probabilities and then taking the difference of the logarithm of these distributions. For

Consensus and unsigned consensus
The consensus value evaluates the extent to which the LLR (or weighted LLR) is shared across an ensemble. It is the norm of the sum of the LLRs (W. LLRs) divided by the sum of the norms. In principle, the functional norm can be anything but in this case we used the ℓ1 norm (the absolute area under the curve), The for an n-member ensemble, the consensus is then The consensus is then calculated over each these sets and the maximum value is taken to be the value of the unsigned consensus.

Network elements
We construct a network of N recurrently connected "firing rate" model neurons to perform a facsimile of the auditory discrimination task. Each model neuron is characterized by an activation variable 8 and nonlinear response function ( ) = tanh ( ) that describes its firing rate. N = 1000 neurons for Fig. 7. Our results are independent of network size (tested on a range from N=1000-10,000) and instead depend on the relative fraction of neurons with specific response properties (to be described).
We denote the connections between model neurons by the NxN synaptic weight matrix (Fig. 7a) The

Design of Inputs and Network Outputs
Auditory input, ℎ 8 = 8 in is provided to 10% of the network units through the vector of input weights, . To represent the auditory stimuli used in the experiment, in is modeled as a square pulse of duration 100 ms and amplitude proportional to the frequency of the sound used experimentally, ranging from 0.5 kHz to 32 kHz in octave increments (see left panel of Fig. 7a). On each trial the network receives one of these "auditory" inputs. After network training, it should respond only to the "target tone" of 4 kHz with an output pulse modeled by out ( ) and produce no response to the other, "non-target tones" (see right panel of Fig. 7a).
The output of the network ( ) is defined as a sum of unit firing rates from the 90% of the units not directly driven by the tone, and weighted by a vector of readout weights, w, such that ( ) = ∑ y y ( y ). Successful performance is achieved by matching ( ) to the overall output out ( ) by modifying the readout weights w using recursive least squares 38 (see also section 3 below) .
In addition to the auditory inputs ℎ 8 , a fraction (75% of N) of the network units sometimes receive stochastic noise during training to ascertain through decoding that they were nominally non-responsive units (see section 3 below). This injected noise varies randomly and independently at every time step as a Gaussian random variable between 0 and 1, drawn from a zero mean and unit variance distribution. Its amplitude is scaled by

Network Training and Evaluating Performance
During training, inputs of individual neurons whose weights are plastic are compared directly with target functions to compute a set of error functions. In fullFORCE 38 , these target functions are generated by the units in an almost identical, auxillary network in which there are no plastic synapses. However, this network receives the target output ‡ˆP ( ) as an external input, which allows it to perform the task trivially. The recurrent connections of the auxillary network are also sampled as described above, but with a g value between 1.2 and 1.5, which have been shown to be effective for this method 41 .
Only the weights between 25% of the network units are modified through The behavioral performance of the trained network is computed as a percentage of trials in which the network produced a pulsed output (green trace in the right panel of Fig.   7a) in response to the target tone (input schematized in the left panel of Fig. 7a) and produced 0 in response to the non-target tone (purple trace in right panel of Fig. 7a). A typical trained network of 1000 neurons was able to perform the task to a percent correct level of 100% (Fig. 7b). Performance was evaluated in the same way after silencing neurons in the responsive or nominally nonresponsive cohort, or alternatively, in a randomly selected fraction of network neurons (Fig. 7c,d).
The predicted decrease in decoding performance for a random subset of the network seen in panel Fig. 7d ( pred. ) was derived from the psychometric inactivation curves of each subpopulation treated independently.

Analysis of selection rule encoding from Rodgers & DeWeese 2014
Using our novel ISI-based decoding algorithm, we analyzed cells found to be nonresponsive in a previously published study 5 . Briefly, rats were trained on a novel auditory stimulus selection task where animals had to respond to one of two cues while ignoring the other depending on the context. Rats held their nose in a center port for 250 to 350 ms and were then presented with two simultaneous sounds (a white noise burst played from only the left or right speaker and a high or low pitched warble played from both speakers). In the "localization" context animals were trained to ignore the warble and respond to the location of the white noise burst and in the "pitch" context they were trained to ignore the location of the white noise burst and respond to the pitch of the warble. Cells recorded from both primary auditory cortex and prefrontal cortex (prelimbic region) were shown to be responsive to the selection rule during the pre-stimulus period (i.e. firing rates differed between the two contexts). Non-responsive cells were reported but not further analyzed.
We confirmed that these nominally non-responsive cells satisfied our own positive statistical criteria for non-responsiveness (described above) by comparing their average spiking activity in the 100 ms immediately preceding stimulus onset across contexts. To determine whether the prestimulus activity of nominally non-responsive cells also encoded the selection rule, we decoded the task context (localization vs. pitch) on single-trials using our ISI-based decoding algorithm. Cells shown in Fig. 5b were deemed statistically significant when compared to the decoding performance of a control using synthetically generated data (p<0.05).

Statistical analysis
All statistical analyses were performed in Python, MATLAB, or GraphPad Prism 6.
Datasets were tested for normality, and appropriate statistical tests applied as described in the text (e.g., Student's paired t-test for normally distributed data, Mann-Whitney U test and Wilcoxon matched-pairs signed rank test for non-parametric data).

Figure 1.
Recording from AC or FR2 during go/no-go audiomotor task. a. Behavioral schematic for the go/no-go frequency recognition task. Animals were rewarded with food for entering the nose port within 2.5 seconds after presentation of a target tone (4 kHz) or given a 7-second timeout if they incorrectly responded to non-target tones (0 Error bars represent standard deviation.