Cell-type-specific population dynamics of diverse reward computations

SUMMARY Computational analysis of cellular activity has developed largely independently of modern transcriptomic cell typology, but integrating these approaches may be essential for full insight into cellular-level mechanisms underlying brain function and dysfunction. Applying this approach to the habenula (a structure with diverse, intermingled molecular, anatomical, and computational features), we identified encoding of reward-predictive cues and reward outcomes in distinct genetically defined neural populations, including TH+ cells and Tac1+ cells. Data from genetically targeted recordings were used to train an optimized nonlinear dynamical systems model and revealed activity dynamics consistent with a line attractor. High-density, cell-type-specific electrophysiological recordings and optogenetic perturbation provided supporting evidence for this model. Reverse-engineering predicted how Tac1+ cells might integrate reward history, which was complemented by in vivo experimentation. This integrated approach describes a process by which data-driven computational models of population activity can generate and frame actionable hypotheses for cell-type-specific investigation in biological systems.

(B) Spatial position of cells from each of the 6 clusters determined in Figure 1E. Scale Bar = 100 mm.
(C) Quantification of overlap of expression from STARmap for 4 genes also quantified by in situ hybridization in Figure 1I. Amplicon counts for each gene were Z scored and cells with Z score>0.5 were denoted as expressing that Gene. Grayscale indicates the proportion of cells expressing Gene 1 that also express Gene 2. Fractional overlap listed inside each box. n = 1440 neurons.
(D) Quantification of the spatial coexpression of genes from in situ hybridization in Figures 1H and 1I. Contours indicate the distribution of cells expressing Gene 1 that also express Gene 2, in 50 mm bins, normalized to the max expression of Gene 1. Diagonal represents spatial expression of individual genes.    Figure S3. Cues and rewards modulate cell-type-specific activity in the habenula, related to Figure 3 (A) In another version of the 3-CSRTT, reward sizes were varied. Correct trials either resulted in a withheld reward (15% of trials, black), a 10mL reward (55% of trials, green) or a 20mL reward (30% of trials, purple). (B-D) Reward-related activity in Hb cell types as a function reward size. Mean Z scored fluorescence data aligned to reward port entry and separated by reward volume. Tac1 + neurons show decrease in activity at larger-than-expected rewards. Small rewards vs none: LHb, p < 0.01; Tac1, p < 0.01. Small vs large reward: LHb, p = 0.1; Tac1, p = 0.03. TH-Cre, n = 4 mice, 960 trials; Tac1-Cre, n = 7 mice, 3572 trials; LHb, n = 4 animals, 1270 trials. Tac1-Cre animals include those with AAV1 injections and fiber placements targeted to the MHb, which had an average of 82% of neurons in the MHb. (E) Example behavioral tracking of a TH-Cre mouse in the 3CSRTT. Arrows indicate head direction in correct, rewarded trials, pseudocolored by the mean fluorescence for that video frame. (F) Z scored fiber photometry data, aligned to time from the turn toward the reward port, calculated as the first frame where the head has swept to within 45 from the reward port.
(G) Fiber photometry data aligned to the illumination of the house light after time out from incorrect or premature trials.
(H) Activity of Hb cell types outside of trial-based operant training. Animals previously trained on the 3-Choice Task were given free rewards in sessions that contained no trial structure. Rewards were pseudorandomly delivered at 10-50s intervals. In 75% of trials, rewards were cued with the reward port light previously associated with reward, the remaining 25% were uncued. t = 0 represents head entry into reward port. Mean Z score of fluorescence aligned to reward delivery for cued and uncued rewards. Colors indicate cued (green) or uncued (light blue) rewards. Th + , p < 0.05 by t test. Th-Cre, n = 5 mice; Tac1-Cre, n = 7 mice; ChAT-Cre, n = 5 mice; Calb1-Cre, n = 5 animals; LHb, n = 4 animals. Tac1-Cre animals include those with AAV1 injections and fiber placements targeted to the MHb, which had an average of 82% of neurons in the MHb.     Figure S6. Extended data on the state space analysis, related to Figure 6 (A) All single-trial trajectories and fixed points for Tac1 + trLFADS models for mouse #1 and #2. The three axes, identified by a targeted dimensionality reduction approach, were orthonormalized. Colored dots indicate slow points found by fixed point analysis. The line is the top PC through the space of slow points. Note the presence of the line attractor dynamics in both mice and that the total activity mode and the line attractor is highly aligned. The rest of the analyses with Tac1 + neurons are visualized with mouse #1. (B) Sets of single trial trajectories for TH + trLFADS model for mouse #1. The trials were grouped in time to highlight the absence of the activity accumulation feature in Tac1 + mice. The rest of the analyses with the trLFADS model for TH1 + neurons is visualized with mouse #1. (C) Explained variance in the state space plots for the trLFADS models for Tac1 + neurons. The ordered components from targeted dimensionality reduction are the total activity mode, the condition independent mode, and line attractor mode. (D) Single-trial inferred external input for Tac1 + trLFADS model shows trial-type-dependent temporal profile, specifically after the cue onset. Mean + /À SEM. (E) The distinct inferred external input for Tac1 + trLFADS model results in larger shifts of total activity in rewarded trials than in unrewarded trials, implying trialtype-dependent integration of reward history. Wilcoxon rank-sum test; *p < 0.01. (F) Single-trial external inputs for TH + neurons also show trial-type-dependent temporal profile, although these inputs are not integrated due to the discrete fixed point arrangement. Firing Rate (z-score) Firing Rate (z-score) Figure S7. Extended data on model-guided experiments, related to Figure 7 (A) Average firing changes in rewarded (green), unrewarded (black), and perturbation (red) trials. Curves, mean; shaded error, SEM from hierarchical bootstrap. (B) Zoomed-in visualization of (A) for each trial type. 2 s windows were used for baseline subtraction (À2 to 0 s) and within-trial firing rate change quantification (À2 to 0 s; 5 to 7 s).
(C) In silico dynamics simulation of TH + neurons predicts no reward history accumulation regardless of the reward probability, as expected.