Parallel systems for sound processing and functional connectivity among layer 5 and 6 auditory corticothalamic neurons

Cortical layers (L) 5 and 6 are populated by a spatially intermingled menagerie of neurons with distinct inputs and downstream targets. Here, we made optogenetically guided recordings from L5 and L6 corticothalamic (CT) neurons in the mouse auditory cortex to discern underlying patterns of functional connectivity and sensory processing in the largest sub-cerebral projection system. Whereas L5 CT neurons showed broad stimulus selectivity with sluggish response latencies and extended temporal non-linearities, L6 CTs exhibited sparse sound feature selectivity and rapid temporal processing. L5 CT spikes lagged behind neighboring units and imposed weak feedforward excitation within the local column. By contrast, L6 CT spikes drove robust and sustained activity in neighboring units. Our findings underscore a duality among CT projection neurons, where L5 CT units are canonical broadcast neurons that integrate sensory inputs for transmission to distributed downstream targets, while L6 CT neurons are positioned to regulate thalamocortical response gain and selectivity.


Introduction
Corticofugal neurons fall into two distinct classes: intratelencephalic and sub-cerebral 42 ( The largest compartment of the corticofugal projection system comes from neurons in 50 layer 5 and 6 of the cortex that innervate the thalamus (Ojima 1994 Until recently, technical limitations have made it difficult to extend observations made in 76 fixed tissue or acute brain slice to the realm of sensory processing in awake animals. Here, we leveraged advances in multi-channel electrophysiology and optogenetics to make targeted 78 recordings of ACtx CT neurons in awake mice. We show that L5 CT neurons utilize dense, nonlinear coding of sound features and have little influence on local cortical processing whereas L6 80 CT neurons have sparse selectivity for sound features, and strongly modulate local processing within ACtx. These findings indicate that each class of CT projection neuron performs distinct 82 operations on incoming sensory signals, which likely impart distinct effects on their downstream targets. 84

Two types of corticothalamic projection neurons
As a first step towards highlighting the differences in L5 and L6 CT neurons, we first 88 wanted to confirm that the well-established patterns of sub-cerebral connectivity in other brain areas and species could be reprised in the mouse auditory system. To visualize L5 CT neurons, 90 we used an intersectional virus strategy that exploited their divergent axons fields that also innervate the tectum (Asokan et al. 2018). We first injected canine adenovirus 2 (CAV2-Cre) 92 into the inferior colliculus, which selectively infects local axon terminals to express retrogradely in upstream projection neurons (N=6 mice, Figure 1A: left) (Soudais et al. 2001;Schneider et al. 94 2014; Liu et al. 2016). We then made a second injection of a Cre-dependent virus in the ipsilateral ACtx to label the corticotectal axon fields as well as other potential downstream 96 targets ( Figure 1A: left). As expected, we observed a smattering of smaller neurons that project to the midbrain in L6b, but a majority cells in L5b with prominent apical dendrites ( Figure 1A: 98 left) (Schofield 2009). In addition to dense terminal labeling in the external and dorsal cortex of the IC (Figure 1C: left), we also observed terminal labelling in the MGBd (Figure 1B: left). 100 These findings confirmed previous reports that corticotectal projection neurons could also be described as CT projection neurons, with the caveat that we do not know what fraction of L5 102 corticotectal cells are also corticothalamic (Bajo et al. 1995;Rouiller & Welker 2000).
We used the Ntsr1-Cre transgenic mouse to label L6 CT neurons based on our prior 104 report that 97% of Ntsr1-positive neurons in the ACtx are L6 CT, and 90% of all L6 CT neurons are Ntsr1-positive (Guo et al. 2017). We injected a Cre-dependent virus into the right ACtx, 106 allowing a fluorescent reporter to be expressed throughout L6 CT axon fields ( Figure 1A: right).
As expected, we observed labelled cell bodies in layer 6a with a band of neuropil labeling in L5a 108 ( Figure 1A: right) and a dense plexus of axon terminal labelling in the MGB, predominantly in the ventral subdivision ( Figure 1D: right). Labelling was not observed in the ipsilateral IC 110 ( Figure 1C: right), indicating that the Ntsr1-Cre expressing cells are separate from other classes of L6 projection neurons that have more distributed local and subcerebral targets (Briggs 2010;112 Sundberg & Granseth 2018).

Antidromic phototagging of ACtx projection neurons
To make targeted recordings from L6 CT neurons, we expressed ChR2 in the ACtx of 116 Ntsr1-Cre mice and implanted an optic fiber such that the tip was positioned lateral to the MGB (Figure 2A). Because the intersectional labeling experiment demonstrated that L5 CT neurons 118 projecting to the MGBd also innervate the inferior colliculus, we targeted L5 CT neurons by implanting an optic fiber atop the dorsal cap of the inferior colliculus of WT mice ( Figure 2B). 120 We made extracellular recordings from awake, head-fixed mice (N=12) using high-density 32channel silicon "edge" probes that primarily spanned L5 and 6 ( Figure 2C). Spikes were sorted 122 into single-unit clusters (n=1,246) using Kilosort (Pachitariu et al. 2016). We used the characteristic pattern of laminar current sinks and sources to assign each recorded neuron to L5 124 (n=509) or L6 (n=625, Extended Figure 2A To identify L5 and L6 CT projection neurons, we optogenetically activated their axon 128 terminals and documented the temporal patterning of antidromic action potentials ( Figure 2D).
Consistent with previous observations, we hypothesized that antidromically activated spikes with 130 very low trial-to-trial jitter in arrival time were non-synaptic events reflecting direct photoactivation, whereas high-jitter spikes likely arose from intracortical synaptic transmission 132 within the ACtx (Li et al. 2015;Jennings et al. 2013;Lima et al. 2009). To verify this, we blocked local glutamatergic transmission within ACtx with local application of NBQX, an 134 AMPA receptor antagonist ( Figure 2E) in a subset of anesthetized control mice (N=4, n=188).
NBQX did not affect the low-jitter mode of the distribution but eliminated spikes with variable 136 first spike latencies, confirming that the low-jitter mode of the distribution reflected direct activation of the recorded projection neuron, whereas higher jitter spikes arose through local 138 polysynaptic activation ( Figure 2E: top). Based on our pharmacology control experiments, we operationally identified recordings of antidromically phototagged L5 and L6 CT neurons in our 140 awake recordings as spikes with laser-evoked first spike latency jitter of 0.3 ms SD or less ( Figure 2E: bottom; L5 CT: n=132, L6 CT: n=83). As further evidence of this, we estimated the 142 laminar locations of the phototagged populations and confirmed that they matched the expected anatomy (Extended Figure 2D). Few of our phototagged CT neurons had spike shapes in the FS 144 range (L5 CT: n=6, L6 CT: n=8), suggesting that these projection neurons are largely separate from the recently described population of parvalbumin-expressing GABA projection neurons 146 (Zurita et al. 2018;Rock et al. 2017).

Sensory characterization of CT projections
To characterize sensory processing differences in optogenetically identified L5 and L6 150 CT units, we first quantified pure tone frequency response areas (FRAs). Only neurons that had well-defined FRAs were included for analysis (L5: n=174, L6: n=172, L5 CT: n=44, L6 CT: 152 n=27). From each FRA, we computed the tuning bandwidth (20 dB above threshold, Figure 3A) and the tone-evoked onset latency ( Figure 3B). While there were no overall differences in tuning 154 bandwidth between neurons in L5 and 6, L5 CT neurons were more broadly tuned than L6 CT neurons (Wilcoxon Rank Sum test, p=0.03, Figure 3C). Neurons in L5, including the L5 CTs, 156 exhibited tone-evoked first spike latencies that were approximately twice as long as L6 (Wilcoxon Rank Sum test; layer: p<2x10 -9 , cell-type: p<4x10 -4 , Figure 3D). 158 As a next step, we generated a set of spectrotemporally modulated noise bursts, that varied in center frequency (4-64 kHz, 0.1 octave steps), spectral bandwidth (0-1.5 octaves, 0.1 160 octave steps), level (0-60 dB SPL, 10 dB SPL steps), and sinusoidal amplitude modulation rate (0-70 Hz, 10 Hz steps) ( Figure 3E). We then used a standard measure of sparsity to quantify the 162 shape of each cell's response distribution ( Figure 3F) ( Chambers et al. 2014). This lifetime sparseness index is bounded between 0 and 1, with 164 values close to 1 reflecting selectivity for a sparse set of stimuli ( Figure 3F: left) and values close to 0 reflecting a broad response distribution ( Figure 3F: right). Responses in L6 were sparser 166 than L5 (Wilcoxon Rank Sum test, p<3x10 -3 , Figure 3G) and complex sound selectivity was sparser in L6 CTs than L5 CTs, in agreement with the differences in pure tone tuning bandwidth 168 (Wilcoxon Rank Sum test, p=0.02, Figure 3G). Tuning bandwidth and sparsity were inversely  Figure 4A) and layer ( Figure 4B) tuning parameters and 176 tested, using cross-validation, how well either the correct cell-type or layer could be predicted.
Classification was better than chance (50%) in either case, although classification errors were 178 significantly lower based on projection cell type than for cortical layer (32% vs 23%, Bootstrapped statistical test, p<0.05, Figure 4C). This indicates that variance in sensory response 180 properties can be better predicted by classification of neuron type, rather than just cortical layer.

Modeling the stimulus-response function of CT neurons
Fully characterizing the stimulus-response transformation of a neuron is an intractable 184 problem as it would require knowing the neural responses to all possible stimuli. A common approach is to present a complex stimulus that spans a sizeable subset of the possible stimulus 186 space, and to then relate the stimulus to the resultant neural response using mathematical models. As a final step, we calculated the directionality of spike train covariance to test the idea 242 that L5 CTs are integrators that pool inputs from local cells before broadcasting signals out of the cortex, whereas L6 CTs control columnar gain by driving local cell types. In this case, we 244 hypothesized that L6 CT spikes would temporally lead spikes in local FS and RS cell types, whereas L5 CT spikes would lag behind local RS and FS spikes. To test this prediction, we first 246 determined which spike train correlations were significant using a confidence bound, before zscoring the cross-covariance functions and ordering them by the location of their peak ( Figure  248 7F). The proportion of lead-preferring interactions was similar across L5 CT and RS/FS pairs, and L6 CT and RS pairs, at approximately 48%. By contrast, 65% of the L6 CT and FS pairs 250 were lead-preferring indicating a preferential interaction between L6 CT neurons and neighboring FS GABA neurons. In addition to the temporal peak of the cross-covariance 252 function, we also analyzed the amount of cross-covariance associated with each direction of interaction. We averaged both sides of the z-scored cross-covariance functions before computing 254 a lead-lag asymmetry index by calculating (lead-lag)/(lead+lag) yielding a value whose distance from 0 indicates either lead-preferring (>0) or lag-preferring. We observed that L6 CT spikes 256 tend to lead FS spikes, while the reverse is true for L5 CT spikes (Wilcoxon Rank Sum test; L5CT<->FS (n=56) vs L6CT<->FS (n=54), p<8x10 -4 , Figure 7H) Ultimately, these findings shed light on how information contained in two primary output 314 layers of the ACtx can be routed through the thalamus, to the rest of the brain. The L6 CT neurons propagate information that is sparse and more selective, but their anatomy places them 316 in a position to modulate information both en route to the thalamus (through the TRN), but also locally through dense communication with the FS population. Such a circuit allows for rapid 318 modulation of response gain and has led to speculation that L6 CT circuits play a crucial role in dynamically regulating stimulus salience according to internal state variables such as anticipation 320 and attention (Guo et al. 2017;Homma et al. 2017).
Thanks in part to their elaborate dendritic structure, L5 CT neurons are able to integrate 322 input from multiple ACtx layers and broadcast a dense, non-linear signal to multiple sub-cortical auditory stations. The large terminals deposited by L5 CT neurons in the MGBd will likely lead 324 to increased spike reliability which will, in turn, ensure that a reliable signal is propagated from higher-order thalamus to higher-order areas of ACtx (Sherman 2012). The impact that these 326 signals have on all target areas also need not be the same, as different downstream synaptic properties could lead to a postsynaptic variation in response. 328 Ascending sensory pathways are classically characterized as two streams: a lemniscal system for higher fidelity propagation of detailed stimulus information and a non- 370 Implantation of optic fibers. Mice were brought to a surgical plane of anesthesia, using the same protocol for anesthesia and body temperature control described above. The dorsal surface 372 of the skull was exposed, and the periosteum was thoroughly removed. If the animal was to undergo awake recordings, the skull was then prepared with 70% ethanol and etchant (C&B 374 Metabond) before attaching a custom titanium head plate (eMachineShop) to the skull overlying bregma with dental cement (C&B Metabond). For L5 CT phototagging, a small craniotomy (0.5 376 mm x 0.5 mm, medial-lateral x rostral-caudal) was made (0.25 mm caudal to the lambdoid suture, 1 mm lateral to midline), to expose the IC. A ferrule and optic fiber assembly was 378 positioned to rest atop the IC and then cemented to the surrounding skull (C&B Metabond). For mice undergoing L6 CT phototagging, a small craniotomy (0.5 x 0.5 mm, medial-lateral x 380 rostral-caudal) was made under stereotaxic guidance, centered 2.75 mm lateral to midline and 2.75 mm caudal to bregma. A ferrule with a longer optic fiber (~3 mm) was implanted 2.7 mm 382 below the brain surface to illuminate axons innervating the MGB. Once dry, the cement surrounding the fiber implant was painted black with nail polish to prevent light from escaping. 384 Mice recovered in a warmed chamber and were housed individually.

386
Acoustic and optogenetic stimulation Acoustic stimulation. Stimuli were generated with a 24-bit digital-to-analog converter (National 388 Instruments model PXI-4461) using custom scripts programmed in MATLAB (MathWorks) and LabVIEW (National Instruments). Stimuli were presented via a free-field electrostatic speaker 390 (Tucker-Davis Technologies) facing the left (contralateral) ear. Free-field stimuli were calibrated before recording using a wide-band ultrasonic acoustic sensor (Knowles Acoustics, model 392 SPM0204UD5).

394
Light delivery. Collimated blue light (488 nm) was generated by a diode laser (Omicron, LuxX) and delivered to the brain via an implanted multimode optic fiber coupled to the optical patch 396 cable by a ceramic mating sleeve. Laser power through the optic fiber assembly was calibrated prior to implantation with a photodetector (Thorlabs). 398 Neurophysiology Awake, head-fixed preparation. Before the first awake recording session, mice were briefly 400 anesthetized with isoflurane (4% induction, 1% maintenance) before using a scalpel to make a small craniotomy (0.5 mm x 0.5 mm, medial-lateral x rostral-caudal) atop the ACtx, at the caudal 402 end of the right squamosal suture centered 1.5 mm rostral to the lambdoid suture. A small chamber was built around the craniotomy with UV-cured cement and a thin layer of silicon oil was applied 404 to the surface of the brain. The mouse was then brought into the recording chamber and the head was immobilized by attaching the headplate to a rigid clamp (Altechna). We waited at least 30 406 minutes before starting neurophysiology recordings. All recordings were performed in a singlewalled sound-attenuating booth lined with anechoic foam (Acoustic Systems). 408 At the end of each recording session, the cement chamber was flushed with saline, filled with antibiotic ointment (Bacitracin) and sealed with a cap of UV-cured cement. The chamber was 410 removed and rebuilt under isoflurane anesthesia before each subsequent recording session.
Typically, 3-5 recording sessions were performed on each animal over the course of 1 week. 412 Data acquisition and spike sorting. At the beginning of each session, a 32-channel, single-shank, 414 silicon probe with 20 µm between contacts (NeuroNexus A32-Edge-5mm-20-177-Z32) was inserted into the ACtx perpendicular to the brain surface using a micromanipulator (Narishige) and 416 a hydraulic microdrive (FHC). Once inserted, the brain was allowed to settle for 10-20 minutes before recording began. We ensured that recordings were made from the core fields of ACtx (either 418 A1 or AAF) based on the tonotopic gradient, response latencies, and frequency response area shape (Guo et al. 2012). 420 Raw neural signals were digitized at 32-bit, 24.4kHz (RZ5 BioAmp Processor; Tucker-Davis Technologies) and stored in binary format for offline analysis. The signal was bandpass 422 filtered at 300-3000 Hz with a second-order Butterworth filter and movement artifacts were minimized through common-mode rejection. To extract local field potentials, the raw signals were 424 notch filtered at 60 Hz and downsampled to 1000 Hz. Signals were then spatially smoothed across channels using a 5-point Hanning window. We computed the second spatial derivative of the local 426 field potential to define the laminar current source density (CSD), which was used to assign each recorded unit to layer 5 or 6 (Guo et al., (2017), Extended Figure 2). 428 Spikes were sorted into single-unit clusters using Kilosort (Pachitariu et al. 2016). All data files from a given penetration were concatenated and sorted together so that the same unit could 430 be tracked over the course of the experiment (~90 minutes), and to ensure that a unit drifting across contacts could be accurately assigned to the same cluster. Single-unit isolation was based on the 432 presence of both a refractory period within the inter-spike-interval histogram, and an isolation distance (>10) indicating that single-unit clusters were well separated from the surrounding noise 434 Once isolated, single units were classified as RS, FS, L5 CT or L6 CT. For the FS and RS 436 classification, the trough-to-peak interval of the mean spike waveform formed a bimodal distribution, which we used to subdivide neurons with intervals exceeding 0.6 ms as RS, while 438 neurons with intervals shorter than 0.5 ms were FS (Extended Figure 2C). We used an optogenetic approach to classify a subset of recorded units as either L5 CT or L6 CT. A 1 ms laser pulse (with 440 power typically ranging from 10-50 mW) was presented between 250 and 1000 times at a rate of 4 Hz, to antidromically activate ACtx neurons. We then fit an analysis window around the laser-442 evoked spikes responsive neurons (operationally defined as recordings where the firing rate increased by at least 5 SD from the pre-laser baseline firing rate). We then computed the temporal 444 jitter as the standard deviation of the first spike that occurred within the responsive window ( Figure   2). 446 where , -represents firing rates, and 9 indexes time. This index is defined between zero (less sparse) and one (more sparse), and depends on the shape of the response distribution :(,). 482 Support vector machine classification. To determine how well tuning parameters could be used to correctly decode either cell-type or layer, we used a binary support vector machine 484 classifier with a linear kernel. We fitted the classifier model to a data matrix consisting of N observations of 3 tuning parameters (bandwidth, latency, and sparsity). Each observation was 486 associated with either a positive or negative class (which was used to indicate either cell-type or layer). Ten-fold cross-validation was then used to compute a misclassification rate. The SVM 488 training and cross-validation procedure was carried out in MATLAB using the 'fitcsvm', 'crossval', and 'kfoldLoss' functions. Uncertainty in the misclassification rates was then 490 quantified using a bootstrapping procedure to compute the 95% confidence intervals for both cell-type and layer. A statistically significant difference between groups was established if the 492 classification accuracy of one group fell outside the bootstrapped 95% confidence interval of the other. 494 Statistical models of complex sound responses. We presented a 4-64 kHz dynamic random chord (DRC) stimulus (Linden, (2003)). This spectrotemporally rich stimulus is clocked such 496 that every 20 ms a combination of 20 ms cosine-gated tone pulses with randomly chosen frequencies and intensities is generated. Frequencies were discretized into 1/12 octave bins 498 spanning 4-64 kHz, and levels were discretized into 5 dB SPL bins spanning 25-70 dB SPL. The number of tones in each 20 ms chord was random, with an average density of 2 tone pulse per 500 octave. A 1 min DRC was generated, and this was repeated for 20 trials, allowing for 20 min of continuous sound presentation. 502 We binned our PSTHs into 20 ms bins (to match the temporal resolution of the DRC) to An estimate of the stimulus-independent trial-to-trial variability, the noise power, was obtained by subtracting this expression from [(= (>) ).^^^^^^^^^ The DRC stimulus drove comparable spiking 524 variability in both CT cell types (two-sample t-test, p=0.31, Extended Figure 5A-B). Noise power was also not different, (two-sample t-test, p=0.73, Extended Figure 5C) indicating that 526 trial-to-trial variability was similar between groups.
The predictive performance of both models was evaluated using "predictive power", an explainable variance metric that is normalized by the signal power (Sahani & Linden 2003b). It 530 is therefore a measure of performance that explicitly takes trial-to-trial variability into account, and provides a means to quantify how much stimulus-related variability can be explained by a 532 model. Its expected value for a perfect model is 1, and for a model predicting only the mean firing rate it is 0. Generalization performance of the models was assessed using 10-fold cross-534 validation.
As described previously, predictive performance on both training and test data depended 536 systematically on the amount of trial-to-trial variability in the recording (Extended Figure 5D Following these previous studies, we obtained population-level predictive performance estimates by extrapolating to the "zero-noise" limit, effectively averaging across the 540 population while discounting the variable impact of noise on each neuron. These extrapolated limits bracket the true average predictive power of the model. 542 To further investigate the relevance of estimated CGF structure, we quantified the primary modes of variability around the mean using principal components analysis (PCA). In all 544 groups, the scatter around the mean was concentrated in the first two or three principal components (PCs). The first three PCs were able to account for almost 60% of the total 546 variability (Extended Figure 5F). The structure present within these PCs reflected the consistent features observed in the population CGFs. Namely, the dominant effect in the first PC was to 548 modulate the overall depth of simultaneous/near-simultaneous enhancement, and in the second PC was to modulate the overall depth of delayed suppression (Extended Figure 5G). 550