Distinct signals in medial and lateral VTA dopamine neurons modulate fear extinction at different times

Dopamine (DA) neurons are thought to encode reward prediction error (RPE), in addition to other signals, such as salience. While RPE is known to support learning, the role of salience in learning remains less clear. To address this, we recorded and manipulated VTA DA neurons in mice during fear extinction. We applied deep learning to classify mouse freezing behavior, eliminating the need for human scoring. Our fiber photometry recordings showed DA neurons in medial and lateral VTA have distinct activity profiles during fear extinction: medial VTA activity more closely reflected RPE, while lateral VTA activity more closely reflected a salience-like signal. Optogenetic inhibition of DA neurons in either region slowed fear extinction, with the relevant time period for inhibition differing across regions. Our results indicate salience-like signals can have similar downstream consequences to RPE-like signals, although with different temporal dependencies.


Introduction
A critical function of VTA DA neurons is to signal reward prediction error (RPE), or the difference between experienced and expected reward (Schultz, Dayan, and Montague 1997;Cohen et al. 2012) . This signal is necessary and sufficient to mediate reinforcement learning (Steinberg et al. 2013;Chang et al. 2016;Tsai et al. 2009;Witten et al. 2011;Parker et al. 2016;Saunders et al. 2018;Zweifel et al. 2009;Kim et al. 2012;Stopper et al. 2014;Adamantidis et al. 2011) . However, rather than uniformly representing a scalar RPE, there is recent appreciation that VTA DA neurons can display heterogeneous and spatially organized signals. For example, we recently demonstrated that during navigation-based decision making, there are spatially segregated representations within VTA DA neurons of a variety of sensory, motor, and cognitive variables (Engelhard et al. 2019) . Others have demonstrated neural correlates of salience-like signals in certain DA neurons (de Jong et al. 2019;Saddoris et al. 2015;Wang and Tsien 2011;Gore, Soden, and Zweifel 2014;Yuan, Dou, and Sun 2019;Cho et al. 2017;Aitken, Greenfield, and Wassum 2016) .
Although most work examining neural correlates of behavior in VTA DA neurons has focused on reward-based tasks, several studies have recorded VTA DA neuron activity during aversive associations (Robinson et al. 2019;Lutas et al. 2019;Wang and Tsien 2011;Mileykovskiy and Morales 2011) . In particular, VTA DA neurons were shown to represent RPE-like signals during fear extinction, in that they display elevated activity when the shock is omitted at the offset of the cue, signaling better-than-expected outcome (Salinas-Hernández et al. 2018;Badrinarayan et al. 2012;Jo, Heymann, and Zweifel 2018) . Manipulation of this activity altered the rate of extinction, suggesting that an RPE-like signal in DA neurons drives reinforcement learning for aversive associations as well as rewarding associations (Salinas-Hernández et al. 2018;Luo et al. 2018) .
Here, we first seek to determine if in addition to RPE-like signals, there are also heterogeneous and spatially organized signals in VTA DA neurons during fear extinction. One possibility is that there are neural correlates of salience, and not only RPE, in subregions of the VTA during fear extinction. Salience can be considered an "unsigned" prediction error -in other words, it is the absolute value of RPE (Bromberg-Martin, Matsumoto, and Hikosaka 2010; Rutledge et al. 2010) . During fear extinction, a neural correlate of salience should have elevated activity during the tone that has been paired with a footshock, and elevated activity at the tone offset, when the expected footshock is not presented.
Spatially segregated RPE versus salience signals in VTA DA neurons would provide a scenario to determine whether or not such distinct signals have similar or different effects on learning, an important question that has not been definitively answered in any behavioral paradigm. One hypothesis is that a salience signal may support learning, similar to an RPE signal. Alternatively, salience signals may acutely modify behavior rather than drive learning.
An advantage of utilizing fear extinction to address these questions is that it provides a continuous readout of learning via a mouse's freezing, allowing us to examine the precise temporal relationship between DA and the expression of learned behavior. However, one limitation of freezing as a readout of learning is that it traditionally requires hand scoring when mice are tethered to neural headgear, as existing software confuses tether movement with mouse movement (Luyten et al. 2014;Shoji et al. 2014) . The need for human labeling has often restricted the analysis of freezing to specific epochs, such as the presentation of the conditioned stimuli.
To overcome this limitation, and provide an automated and unbiased measure of freezing, we developed an analysis pipeline that uses deep learning to automatically identify freezing behavior. We combine this approach with fiber photometry to characterize the spatial distribution of RPE and salience correlates across the medial-lateral axis of the VTA during fear extinction.
Finally, we performed optogenetic inhibition of DA neurons in each VTA subregion to assess if, when and how these signals affect fear extinction.

Development and application of a convolutional neural network (CNN) to identify freezing behavior
We developed an analysis pipeline based on a convolutional neural network (CNN) to identify freezing behavior in mice. The CNN was initialized on the pre-trained ResNet18 architecture (He et al. 2016) , and further trained on "difference images," the pixel-by-pixel intensity difference between consecutive pairs of frames. The rationale for inputting difference images to the CNN was to capture frame-by-frame motion. Each difference image was hand-labeled as 1 or 0 to signify "freeze" or "no freeze," and the network learned to predict labels for new difference images ( Figure 1A-C, Supplementary Figure S1A). We trained a CNN for each of two different experimental chambers (fear conditioning chamber and fear extinction chamber) and two different neural headgears (fiber photometry and optogenetics). Each classifier achieved optimal training within 50 training epochs (Supplementary Figure S1B-E) and yielded 92%-96% accuracy, 5-10% false positive rate (FPR) and 4-6% false negative rate (FNR) ( Figure 1D). This was comparable to the relative performance between two humans: given one person's scoring held as ground truth, the other person scored with 95% accuracy, 11.5% FPR and 0.3% FNR (Supplementary Figure  S1F).
We compared the CNN performance with the popular proprietary software FreezeFrame ( Figure 1E-G) on an additional 33,000 frames from several mice. Since FreezeFrame produces a second-by-second readout of freezing, we calculated the mean freezing for each second from both the CNN and from human-labeled frames to create a comparable second-by-second readout. We found that the CNN better reflected the human observer than FreezeFrame ( Figure  1E). The CNN and human observer yielded a correlation of 0.96 (Pearson correlation, Figure 1F), while FreezeFrame and human observer yielded a correlation of 0.85 (Pearson correlation, Figure  1G). In comparison, the correlation between two human observers was 0.98 (Pearson correlation, Supplementary Figure S1F).
Taken together, our pipeline provides an automatic, fast and effective method for scoring freezing. In this paper, we used the CNN to analyze over 500 hours of behavioral data during fear conditioning and extinction, which would have been prohibitively time consuming without an automated approach.

Neural activity in medial and lateral VTA DA neurons during fear extinction correlates with RPE and salience, respectively
We performed fear conditioning and extinction ( Figure 2A) while simultaneously performing fiber photometry to record from VTA DA neurons. On day 1, mice were presented with ten tones of 20 s duration ("habituation"), followed by ten 20 s tones that coterminated with a 1 s, 0.5 mA foot shock ("conditioning"). On days 2 to 4, mice were presented with twenty-one 20 s tones alone each day ("extinction"). Mice froze very little during habituation, quickly increased freezing during conditioning, and slowly decreased freezing to the tone over three days of extinction ( Figure 2B).
We next sought to determine if in VTA subregions, DA neurons correlated with RPE or salience. During fear extinction, we expect a neural correlate of RPE to have suppressed activity during a tone that has been paired with a foot shock to signal worse than expected outcome, and elevated activity during the tone offset to signal better than expected outcome because the footshock was omitted. A neural correlate of salience, which can be considered an "unsigned prediction error," or the absolute value of RPE, should instead have elevated activity during the tone that has been paired with a footshock, and elevated activity at the tone offset, when the shock was unexpectedly omitted ( Figure 2C).
To perform fiber photometry recordings from VTA DA neurons, we expressed the calcium indicator GCaMP6f in DA neurons by crossing Dat::IRES-Cre mice with Ai148D mice (see Methods; Engelhard et al. 2019). We targeted the recording fiber to either medial or lateral VTA ( Figure 3A-C, Supplementary Figure S2A; medial VTA group: n = 10 mice, lateral VTA group: n = 11 mice). We chose these subregions because of the electrophysiological, anatomical and functional evidence that there is a medial/lateral distinction within the VTA (Lammel, Lim, and Malenka 2014;Beier et al. 2015;Yang et al. 2018;Engelhard et al. 2019;de Jong et al. 2019) .
During habituation and fear conditioning, medial and lateral VTA DA neuron signals were similar, but could not be easily explained as purely RPE or salience (Supplementary Figure S2D, E top two rows, and S2F-H). GCaMP6f fluorescence in both regions decreased throughout the tone during habituation and conditioning. Both regions also showed increased fluorescence to the shock (Supplementary Figure S2B and C).
During fear extinction, we observed that medial versus lateral VTA cell bodies preferentially reflected RPE versus salience, respectively ( Figure 3D-F, Supplementary Figure  S3D, E bottom three rows). Medial VTA DA neuron activity was consistent with RPE ( Figure 2C): GCAMP6f fluorescence decreased throughout the tone that had been associated with a negative outcome, and increased at the offset of the tone, during the omission of the expected shock ( Figure 3E; percentile rank of the mean GCaMP6f fluorescence every second relative to shuffled data: p < 0.01 after Bonferroni correction for multiple time point comparison). Both of these signals diminished in magnitude throughout extinction, as expected with RPE.
In contrast, during fear extinction, lateral VTA DA neuron activity was more consistent with salience ( Figure 2C). GCaMP6f fluorescence increased at the tone onset, consistent with an unsigned prediction error ( Figure 3F; percentile rank of the mean GCaMP6f fluorescence every second relative to shuffled data: p < 0.01 after Bonferroni correction for multiple time point comparisons). GCaMP6f fluorescence also increased at the tone offset, consistent with the idea that the end of an event is salient. Both these signals also diminished throughout extinction, as the tone loses its salience.
To control for possible artifacts in these recordings, we recorded from DAT-Cre mice expressing Cre-dependent GFP (AAV5-DIO-eGFP or AAV5-DIO-eYFP) in medial or lateral VTA. There was little modulation in the signal in the control mice. The exception was the time of the shock, which generated depressed fluorescence, which was the opposite of the increased fluorescence observed with GCaMP6f in medial and lateral VTA (Supplementary Figure S3). This suggests that our conclusions above were not due to a recording artifact.

Medial and lateral VTA DA neuron activity during extinction correlate distinctly with freezing on a trial-by-trial basis and across animals
We next characterized the correlation between freezing and GCaMP6f fluorescence during fear extinction in each subregion, both on a trial-by-trial basis, and across animals ( Figure 4).
In the medial VTA DA neurons, we observed a negative correlation between freezing during the tone and GCaMP6f fluorescence during the tone onset of the same trial ( Figure 4A; one sample t-test p = 0.021, n = 10 mice), but a positive correlation between freezing during the tone and GCaMP6f fluorescence at the tone offset of the same trial ( Figure 4A; one sample t-test p < 10 -4 , n = 10 mice). These correlations are consistent with the interpretation that medial VTA DA encodes an RPE-like signal during fear extinction. Specifically, the negative correlation between fluorescence during the tone and freezing is consistent with the idea that the degree of inhibition in DA during the tone reflects the learned negative association with the tone (Mileykovskiy and Morales 2011) . Similarly, the positive correlation between DA after the tone and freezing is consistent with the idea that the fluorescence reflects the degree of "relief" that the animal experiences when the expected shock is omitted. These same correlations that we observed across trials were also evident when correlating trial-averaged medial VTA GCaMP6f fluorescence and freezing across animals ( Figure 4B, C). Mice with lower average GCaMP6f fluorescence during the tone onset or higher average GCaMP6f fluorescence at the tone offset froze more to the tone (Pearson's correlation between GCaMP6f 5 s after tone onset and freezing during tone: r = -0.644, p = 0.044; GCaMP6f 5 s after tone offset and freezing during tone: r = 0.768, p = 0.010).
In contrast, in the lateral VTA, we observed a positive correlation across trials between freezing throughout the tone and GCaMP6f during the tone onset for each trial ( Figure 4D; one sample t-test p < 10 -4 , n = 11 mice), but not the tone offset ( Figure 4D; one sample t-test p = 0.071). This is consistent with the interpretation that fluorescence during the tone in lateral VTA reflects the salience of the tone, given that more salient stimuli should elicit more freezing. We observed similar trends across mice ( Figure 4E, F; Pearson's correlation between GCaMP6f during tone onset and freezing during tone: r = 0.583, p = 0.060. Pearson's correlation between GCaMP6f after tone offset and freezing during tone: Pearson's correlation, r = 0.665, p = 0.025).

At the tone offset, inhibition of medial but not lateral VTA DA neurons slows fear extinction
To investigate whether medial or lateral VTA DA activity is necessary for extinction learning, we used optogenetics to inhibit these subregions at specific time points. We injected Cre-dependent NpHR (AAV2/5 DIO-eNpHR3.0-EYFP) or YFP control virus (AAV2/5 DIO-EYFP) into the VTA of DAT-Cre mice and bilaterally implanted optic fibers above medial or lateral VTA.
× We next examined the effect of VTA DA neuron inhibition during fear extinction in separate cohorts of mice with fibers in medial or lateral VTA; each cohort had separate groups of NpHR-or YFP-virus injected mice ( Figure 5E-I, Supplementary Figure S5A-B; medial VTA cohort: NpHR group n = 14, YFP group n = 10. Lateral VTA cohort: NpHR group n = 8, YFP group n = 8). Both cohorts underwent fear conditioning, and during fear extinction, received inhibition lasting for 6 s starting from the last second of the tone to 5 s after the tone off.
Inhibiting the medial VTA at tone offset yielded an effect consistent with the representation of RPE in this subregion: NpHR mice froze more to the tone compared to the YFP controls ( Figure 5F, G, Supplementary Figure S5D; 2-factor mixed ANOVA with group and tone number as factors: group effect: F (1,62) = 7.626, p = 0.011; group tone number interaction: F (1,62) = 1.071 p = × 0.333). In addition to affecting extinction of the tone, NpHR mice increased freezing after the inhibition period compared to YFP controls (2-factor mixed ANOVA with group and tone number as factors. Group effect: F (1,62) = 10.270, p < 0.004; group tone number interaction: F (1,62) = × 0.625 p = 0.990). Thus, not only do medial VTA DA neurons lead to updating of the value (or freezing behavior) upon subsequent presentations of the tone that preceded inhibition, they also modify behavior subsequent to the inhibition period.
In contrast to the effects observed with medial VTA inhibition, inhibiting the lateral VTA cohort at tone offset yielded no change in freezing during the tone ( Figure 5H, I, Supplementary Figure S5E; 2-factor mixed ANOVA with group and tone number as factors: group effect: F (1,62) = 0.033, p = 0.858; group tone number interaction: F (1,62) = 1.074 p = 0.329). This manipulation × did, however, increase freezing after the inhibition period, an effect that increased with extinction (2-factor mixed ANOVA with group and tone number as factors, 6 s after the inhibition period; Group effect: F (1,62) = 7.627, p = 0.015; group tone number interaction: F (1,62) = 1.394 p = 0.027).
× This suggests that lateral VTA inhibition primarily affects freezing at time points after inhibition, rather than causing learned changes in the value of preceding events.

Inhibition of lateral VTA at the tone onset slows extinction learning
We next inhibited lateral VTA DA neurons during fear extinction at the tone onset, given that is when the fiber photometry recordings showed a pronounced salience-like signal in that subregion ( Figure 6A, B, Supplementary Figure S5C; NpHR group n = 12, YFP group n = 10; 6 s inhibition period). We found that NpHR mice extinguished more slowly to the tone. This is reflected in that there was no main effect of group, but a significant interaction between group and tone number ( Figure 6C; Supplementary Figure S5F; 2-factor mixed ANOVA with group and tone number as factors: group effect: F (1,62) = 0.01, p = 0.921; group tone number interaction: F (1,62) = × 2.198, p < 10 -6 ). Together, this suggests that inhibiting the salience-like signal at tone onset slows extinction learning over time, even though activity of these neurons at tone offset does not update the value of the preceding tone.

Discussion
DA neurons originating in the VTA are known to be modulated by aversive stimuli, and have been implicated in fear conditioning and extinction (El-Ghundi, O'Dowd, and George 2001;Young, Joseph, and Gray 1993;Inoue et al. 2000;Nader and LeDoux 1999;Guarraci and Kapp 1999;Holtzman-Assif, Laurent, and Westbrook 2010;Delgado et al. 2008;Luo et al. 2018;Salinas-Hernández et al. 2018;Zweifel et al. 2011;Pezze and Feldon 2004;Mueller, Bravo-Rivera, and Quirk 2010;Pignatelli et al. 2017;Nasehi et al. 2016;Pezze, Bast, and Feldon 2003;Budygin et al. 2012;Robinson et al. 2019;Lammel et al. 2011;Jo, Heymann, and Zweifel 2018;Lutas et al. 2019;Fadok, Dickerson, and Palmiter 2009;Groessl et al. 2018;Bouchet et al. 2018;Wenzel et al. 2018;Wang and Tsien 2011;Mileykovskiy and Morales 2011) . However, it was unclear if and how neural correlates of fear extinction are topographically organized within the VTA. In addition, the causal contribution of spatially localized DA activity within the VTA to fear extinction was unknown. Here, we found that during fear extinction, medial VTA more closely resembled RPE, while lateral VTA more closely resembled salience. While activity in both subregions contributed causally to fear extinction, the temporal relationship between activity and freezing differed. Consistent with the idea that RPE signals update the value of preceding events, inhibition in medial VTA served to update the value (or freezing response) to the tone preceding the inhibition, causing freezing to be less likely on subsequent tone presentations. In contrast, inhibition of the salience-like signal in lateral VTA affected freezing in the time period during or immediately following the inhibition, but did not update the value (or freezing response) to subsequent presentations of the tone if it preceded the inhibition.

Development and application of a CNN to identify freezing behavior
Learned fear is typically quantified by measuring a mouse's freezing. When mice are tethered to neural headgear, existing automated algorithms cannot accurately dissociate movement of the mouse from movement of the tether. To solve this problem, we turned to deep learning and used an off-the-shelf CNN architecture, ResNet, known for its applications in image classification and pose estimation (Nath et al. 2019;Insafutdinov et al. 2016;He et al. 2016) . Our approach is based on the same network architecture as the popular pose estimation software DeepLabCut. However, while DeepLabCut trains on raw video frames, our input image is the pixel intensity difference between consecutive frames. In addition, DeepLabCut adds an additional deconvolutional layer to ResNet to extract pose estimation from multiple joints, which was not necessary for our classification problem. Our CNN allowed accurate and automated classification of freezing behavior throughout the duration of our experiments with minimal labor, and enabled us to determine that the precise temporal relationship between dopamine neuron activity and freezing behavior depended on VTA subregion.

Spatial organization of RPE and salience-like signals in the medial versus lateral VTA
Our finding of RPE-like signals in medial VTA and salience-like signals in lateral VTA during fear extinction may reflect a larger organizational structure across both the VTA and the substantia nigra pars compacta (SNc). Specifically, there is previous evidence that the SNc, which is lateral to the VTA, has a greater proportion of DA neurons that signal salience rather than RPE (Matsumoto and Hikosaka 2009;Menegas et al. 2017) .
On the other hand, it is not obvious how to integrate our spatial findings with recent DA terminal recordings within the NAc, despite the rough topography between VTA cell bodies and their projections (Yang et al. 2018;de Jong et al. 2019;Beier et al. 2015) . In particular, recent studies reported increased activity to aversive stimuli in ventromedial NAc, and decreased activity to aversive stimuli in lateral NAc (de Jong et al. 2019;Yuan, Dou, and Sun 2019) . Assuming topography between VTA cell bodies and their terminals in NAc, we may have expected the opposite result. This discrepancy could be due to a number of factors, including imperfect topography between cell bodies and terminals, differences in the behavioral paradigm, or terminal activity that does not reflect cell body activity (Berke 2018;Threlfell et al. 2012) .
One question is how the presence of salience signals in lateral VTA and SNc may relate to other work showing correlates of kinematics in addition to RPE in those same regions (Engelhard et al. 2019;da Silva et al. 2018;Howe and Dombeck 2016) . One possible relationship between these seemingly disconnected findings of salience versus kinematic tuning is that an animal's speed is a reflection of the motivational salience of an environment (Zénon, Devesse, and Olivier 2016;Panigrahi et al. 2015;A. Y. Wang, Miura, and Uchida 2013) .

Distinct temporal relationships between activity in the medial and lateral VTA and fear extinction
Our manipulations of medial VTA DA neurons produced changes in extinction learning that were consistent with the presence of an RPE-like signal in that subregion. Specifically, the burst of DA at the omission of the expected shock contributed causally to extinction learning, a finding which aligned with previous work (Salinas-Hernández et al. 2018;Luo et al. 2018) .
In addition, this manipulation increased freezing in the time period immediately after the inhibition, suggesting that not only do these neurons regulate learning about the value of a preceding event (the tone), they also affect freezing behavior in the time period subsequent to the manipulation.
While our results in medial VTA were consistent with evidence that RPE signals in DA neurons support reinforcement learning in a variety of paradigms (Steinberg et al. 2013;Witten et al. 2011;Chang et al. 2016;Zweifel et al. 2009;Tsai et al. 2009;Kim et al. 2012;Stopper et al. 2014;Adamantidis et al. 2011) , the causal contribution of salience signals in DA neurons to behavior is much less clear. By leveraging the spatial organization we uncovered between RPE versus salience signals in medial versus lateral VTA, we could directly examine the causal role of salience-like signals. One possibility is that, similar to RPE, salience signals in lateral VTA may also support learning (Bromberg-Martin, Matsumoto, and Hikosaka 2010) .
We found that inhibition of the lateral VTA at the tone offset did not influence extinction learning of the preceding tone. However, similar to medial VTA, this manipulation increased freezing in the timepoints directly after the inhibition, suggesting that despite the fact that these neurons did not regulate learning about the value of a preceding event (the tone), they did affect behavior in the time period subsequent to the manipulation. Consistent with this manipulation affecting subsequent timepoints, inhibition at the tone onset, when these neurons have a burst of activity, had the effect of slowing the rate of tone extinction.
In summary, during fear extinction, medial and lateral VTA DA neurons provide distinct but complementary signals that are both necessary for extinction learning, but at different times. Medial VTA encodes an RPE-like signal, which serves to update the value of the preceding tone. Lateral VTA encodes a salience-like signal at the tone onset, which does not update the value of the preceding tone, but effects freezing during the tone. This is consistent with the emerging framework that despite differences in neural correlates, different VTA/SNc DA subpopulations share a common function of mediating learning, even though they may differ in the specific setting in which they contribute to learning, or the specific aspect of learning that they mediate (Ellwood et al. 2017;Saunders et al. 2018;Cox and Witten 2019;Bromberg-Martin, Matsumoto, and Hikosaka 2010) .

Author Contributions
LXC and IBW designed experiments. LXC collected and analyzed data. KP assisted with fiber photometry and optogenetics data collection. GWG and LXC coded the CNN pipeline. CLH hand scored behavioral videos for CNN, and assisted with histology. WTF collected electrophysiology data. LXC and IBW wrote the paper with comments from all the authors. Obtain "difference images," which are the pixel intensity difference between consecutive frames, and hand label a subset as "freeze" or "not freeze." C. Train the CNN classifier to predict "freeze" or "no freeze", see Methods for details. D. Train four separate CNNs for two types of neural headgear (fiber photometry and optogenetics) and two experimental backgrounds (fear conditioning and fear extinction). For each CNN, plot shows its accuracy (Acc), false positive rate (FPR), false negative rate (FNR), false negatives (FN), false positives (FP), true negatives (TN), and true positives (TP). Error bars denote SEM of n different training and test data (n = 23 for fiber photometry conditioning context, n = 6 for fiber photometry extinction context, n = 15 for optogenetics fear conditioning context, n = 6 for optogenetics extinction context), see Methods and Suppl. Figure 1 for details. E-G: Comparison between human observer, CNN and FreezeFrame performance on held-out data in the fiber photometry extinction context. E. Left: A trial from an example mouse showing percent freezing per second as measured by human observer (yellow), CNN (black), and FreezeFrame (red). For hand scoring and CNN, percent freezing per second is the mean value of 11 frames where each frame is assigned '1' for 'freeze' and '0' for 'no freeze' relative to previous frame (video acquired at 11 fps). Right: Data from all trials and 7 mice showing percent freezing per second as measured by human observer (yellow), CNN (black), and FreezeFrame (red). (n = 7 mice, 15 trials per mouse). In both subplots, blue lines denote 20 s tone duration and error bars denote SEM.   Auditory fear conditioning and extinction across 4 days: habituation and fear conditioning occur on the 1st day, followed by 3 days of extinction. Habituation and conditioning occur in the same experimental chamber, and extinction occurs in a different experimental chamber. During habituation, mice received 10 tones lasting 20 s each. During fear conditioning, mice received 10 tones that coterminated with a 1 s foot shock. During extinction, mice received 21 tones. In all conditions, the inter-trial interval was jittered with a mean of 80 s. B. Mean freezing during each tone throughout auditory fear conditioning and extinction (n = 21 mice). Error bars denote SEM. C. Schematic of expected neural activity representing reward prediction error (RPE) and salience during fear conditioning and extinction. During extinction, a neural correlate of RPE would have suppressed activity during a tone that has been paired with a footshock to signal worse than expected outcome, and elevated activity during the tone offset to signal better than expected outcome because the footshock was omitted. A neural correlate of salience can be considered an "unsigned prediction error," or the absolute value of RPE.  Example brain slices from different mice showing fiber placement in medial or lateral VTA, at bregma = -3.28 AP. Green denotes GCaMP6f. C. Histology from lateral VTA, staining for cell nuclei (DAPI), tyrosine hydroxylase (TH), and GCaMP6f. D-F: Behavior and VTA DA neuron activity during auditory fear extinction. At the bottom of each subplot, bright blue lines denote tone duration. D.

Left column:
Time course of mean percent freezing for each tone during each day of fear extinction (n = 21 mice). For each extinction day subplot, color intensity corresponds to freezing percentage. Right column : Time course of mean percent freezing over all tones for each extinction day (n=21 mice). Gray shading denotes 1 standard deviation. E. Left column: Time course of mean medial VTA DA neuron activity (GCaMP6f fluorescence) for each tone during each day of fear extinction (n = 10 mice). For each extinction day subplot, color intensity denotes GCaMP6f z-score dF/F. Right column : Time course of mean medial VTA DA neuron activity (GCaMP6f fluorescence) over all tones for each extinction day (n=10 mice). Pink shaded region denotes one standard deviation. Gray shaded region represents 1 s time points where GCaMP6f significantly deviates from shuffled data (percentile rank, p < 0.01 after Bonferroni correction for multiple time point comparisons). F. Same as E but for lateral VTA DA neuron activity (n = 11 mice), using blue instead of pink shading.  Medial VTA trial-by-trial and across animal correlations between freezing and GCaMP6f during fear extinction. In each plot, each dot represents one mouse (n = 10 mice for each plot). A. Trial-by-trial correlations: Pearson's correlation coefficient per mouse between mean freezing during each extinction tone and mean GCaMP6f fluorescence 5 s at tone onset (left) or tone offset (right) of the same tone. Error bars denote one standard deviation. Stars denote correlation coefficients that significantly differ from a mean of 0 using one sample t-test (*p < 0.05, ***p < 0.001). B. Across animal correlations: Correlation across mice between mean freezing during all extinction tones and mean GCaMP6f fluorescence during extinction 5 s at tone onset. Correlation coefficient (r) and p-values on the bottom left of plot; star denotes p < 0.05. C. Same as B, but for mean GCaMP6f fluorescence 5 s after tone offset. D-F. Same as A-C , but for lateral VTA trial-by-trial and across animal analysis (n = 11 mice for each plot).  Example brain slice showing fiber placements in medial VTA. Green denotes NpHR3.0-eYFP. G. Percent freezing per second averaged across all tones for each extinction day during medial VTA tone offset inhibition (pink line denotes NpHR group, black line denotes YFP control group). Yellow shaded region denotes the inhibition period. Error bars denote SEM. Stars denote a significant group effect using a 2-factor mixed ANOVA to predict freezing where group and tone number (across all three extinction days) are factors. This analysis was applied to freezing during the tone (group effect: F (1,62) = 7.626, p = 0.011; group tone number interaction: F (1,62) = × 1.071 p = 0.333) and freezing 6 s following the inhibition period (group effect: F (1,62) = 10.270, p < 0.004; group tone × number interaction: F (1,62) = 0.625 p = 0.990). H. Same as F except for lateral VTA (n = 8 mice for NpHR group, n = 8 mice for YFP group). For full AP coordinates of fiber placements, see Supplementary Figure S5B. I .
Same as G , but for lateral VTA, with blue line denoting NpHR group. Stars denote a significant group and interaction effect using a 2-factor mixed ANOVA where group and tone number (across all three extinction days) are factors. This analysis was applied to freezing during the tone (group effect: F (1,62) = 0.033, p = 0.858; group tone number interaction: F (1,62) = 1.074 p = × 0.329) and freezing 6 s following the inhibition period (group effect: F (1,62) = 7.627, p = 0.015; group tone number × interaction: F (1,62) = 1.394 p = 0.027).  A. Examples of "freeze" or "no freeze" difference images from the fiber photometry extinction context used for input into the CNN.  Figure 1D . Right: 2-dimensional histogram to compare human observer 1 to human observer 2 (Pearson correlation coefficient r = 0.98, p = 0, n = 2940 samples). Analogous to Figure 1F, G .   Time course of mean percent freezing for each tone during each experiment condition (n = 8 mice). For each subplot, color intensity corresponds to freezing percentage. Right column : Time course of mean percent freezing over all tones for each extinction day (n=8 mice). Gray shading denotes 1 standard deviation. D. Left column: Time course of eYFP or eGFP fluorescence in medial VTA for each tone during each experiment condition (n = 4 mice). For each subplot, color intensity denotes GCaMP6f z-score dF/F. Right column : Time course of mean eYFP or eGFP fluorescence over all tones for each extinction day (n=4 mice). Pink shaded region denotes one standard deviation. Gray shaded region represents 1 s time points where fluorescence significantly deviates from shuffled data (percentile rank, p < 0.01 after Bonferroni correction for multiple time point comparisons). E. Same as D but for lateral VTA eYFP or eGFP fluorescence (n = 4 mice), using blue instead of pink shading.     Figure 5F, H and 6B). Dots represent individual optic fiber placement: orange dots for mice injected with AAV2/5-DIO-NpHR-eYFP, grey dots for control mice injected with AAV2/5-DIO-eYFP. A. Bilateral fiber placement for medial VTA inhibition during tone off (NpHR = 14, YFP = 10 mice). B. Bilateral fiber placement for lateral VTA inhibition during tone off (NpHR = 8, YFP = 8 mice). C. Bilateral fiber placement for lateral VTA inhibition during tone on (NpHR = 12, YFP = 10 mice). D . Freezing during the tone for each tone number, by experiment condition: habituation, conditioning, extinction day 1, extinction day 2 and extinction day 3. Pink dots denote NpHR group (mice expressing NpHR in dopamine neurons with optic fibers in medial VTA and light delivery at tone offset during all extinction trials, n = 14 mice), and black dots denote YFP control group (mice expressing only YFP in dopamine neurons with optic fibers in medial VTA and light delivery at tone offset during all extinction trials, n = 10 mice). Error bars denote SEM. E. Same as D except the NpHR and YFP mice have optic fibers implanted in lateral VTA: blue instead of pink dots denote NpHR group (n = 8 mice), black dots denote YFP control group (n = 8 mice). F. Same as E except the NpHR group received light inhibition during tone onset instead of offset during all extinction trials (n = 12 mice), as did the YFP control group (n = 10 mice).

Animals:
All experiments followed guidelines established by the National Institutes of Health and reviewed by Princeton University Institutional Animals Care and Use Committee (IACUC). Dat::IRES-Cre mice (B6.SJL-Slc6a3tm1.1(cre)Bkmn/J, Jackson Labs) were bred in house with Ai148D mice (Ai148(TIT2L-GC6f-ICL-tTA2), Jackson Labs), and male progeny expressing GCaMP6f in dopamine neurons (Dat::IRES-Cre x Ai148D mice) were used for fiber photometry experiments. Male Dat::IRES-Cre mice were used for optogenetic experiments and electrophysiology recordings. Mice were group housed with up to 5 other mice, allowed ad libitum access to food and water, and kept on a 12-hr light on and 12-hr light-off schedule. Mice between 7-10 weeks of age were used during surgery. We conducted all surgery and behavioral experiments during light off period.
Stereotactic surgeries : Mice were induced to a surgical plane of anaesthesia using 4% isoflurane, and maintained at .5-2% isoflurane for the duration of the surgery. Mice were kept warm using a heating pad, and breathing rate was monitored by the surgeon. At surgery onset, mice received i.p. injections of nonsteroidal anti-inflammatory drug meloxicam (2 mg/kg) and the antibiotic baytril (5 mg/kg). Twenty-four hours post surgery, mice received a second equivalent dose of meloxicam.

Behavioral Assays:
Auditory fear conditioning and extinction: Following a minimum of 6 days to recover from cranial surgery, mice 8-12 weeks of age underwent auditory fear conditioning and extinction. We performed this assay using FreezeFrame (Coulbourn Instruments), and throughout the assay gray-scaled video was recorded at 11 fps while mice were tethered to either fiber photometry or optogenetic cables for neural recording or manipulation, respectively. On day one, mice were placed in a chamber (10 cm x 10 cm) with a conductive metal floor and a smell of ethanol. Mice were habituated to ten 20 s tones (5 kHz, 70 dB), followed by ten 20 s tones that coterminated with a 1 s foot shock administered through the metal floor (1 mA, scrambled). On days two to four, mice underwent extinction in a different but similarly sized experimental chamber with plastic floor and walls, and a different smell (quatricide). During extinction, mice heard 21 tones per day without the shock. For all experiment days, the inter-trial interval (ITI) between tones was jittered with a Poisson distribution with a mean of 80 s; the ITI was randomly selected prior to all experiments and was the same for each mouse.
Real-time place preference (RTPP): Prior to, or following the fear conditioning and extinction assay, mice underwent real-time place preference assay, counter-balanced between groups. Mice were connected to optical fibers and individually placed in a rectangular enclosure with two chambers (28 cm x 28 cm each), and allowed to freely explore the two chambers for 20 minutes. Gray-scaled video was recorded at 30 fps and mouse location and velocity was tracked throughout experiment using Ethovision XT 9. For the first 5 minutes, the mice freely explored both chambers but they did not receive optical stimulation ("baseline"). For the next 15 minutes, mice continued to have access to both chambers, but the presence in one chamber resulted in continuous delivery of laser light (594 nm, 6 mW, Cobalt) to the implanted optic fibers. The "light on" chamber was randomly assigned and counterbalanced within each experimental group (NpHR or YFP).

Fiber Photometry Experiments:
One to two weeks after surgery, mice individually underwent fear conditioning and extinction during fiber photometry recording of VTA DA neurons expressing GCaMP6f. Mice were connected to a fiber photometry set up described in previous reports (Gunaydin et al. 2014) . A 488 nm laser light (Micron Technology) was filtered (FL488, Thor Labs) then passed through a dichroic mirror (MD498, Thor Labs) and traveled through a patch cable (Mono Fiberoptic Patchcord, 400 um core, 0.48 Numerical Aperture, Doric Lenses) coupled via a ceramic split sleeve (2.5 mm diameter, Precision Fiber Products) to the optic fiber implanted in the mouse brain; the light traveled down the optic fiber into the VTA for fluorescence excitation. Laser light delivery was controlled by a lock-in amplifier (Ametek, 7265 Dual Phase DSP Lock-in Amplifier), which delivered light at 210.999 Hz, and the laser intensity at the tip of the patch cable was approximately 5 uW. Fluorescent emission from GCaMP6f at 500-550 nm was then passed through the same patch cable, filtered (MF525-39, Thor Labs), and passed through the same dichroic mirror into a photodetector (Model 2151, New Focus), and the signal was filtered at the same 210.999 Hz using the same lock-in amplifier, and a time constant of 20 ms. AC gain on the lock-in amplifier was set to 0 dB. Signal was digitized at 100 Hz. dF/F was calculated by the following formula: Where is the second order polynomial fit to the GCaMP6f signal for the duration of the F 0 experimental session, in order to account for very slow decline of the signal over time, presumably caused by photobleaching. Z-score of dF/F signal was calculated across experiment days for each mouse. To obtain shuffled data, GCaMP6f fluorescence from each experiment day was circularly shifted in time by a random time bin (n = 1000 shuffles). The Bonferroni corrections used in Figure 3E,F, Supplementary Figure S2G,H and Supplementary Figure S3D,E, were corrected by 40 time points to account for the 40 1-second-long bins.
Optogenetic Experiments: Approximately 1 month after opsin or control virus injection, mice underwent fear conditioning and extinction. Littermates within a cage were randomly allocated to either NpHR or YFP group, with as equal distribution as possible. (Ie. in a cage with 4 mice: 2 mice were in NpHR group, 2 mice in YFP group. In a cage with 5 mice, 2 mice were in NpHR group and 3 mice in YFP group or vice versa.) Masking was not used during group allocation or data collection. This ensured that NpHR and YFP mice were interleaved during the experimental sessions and counterbalanced in different behavioral chambers. Masking was a central part of the behavioral analysis: the CNN scores freezing the same regardless of the mouse's experimental group. Medial or lateral VTA dopamine neurons were inhibited using continuous 594 nm laser illumination (~6 mW, Cobalt) for 6 seconds during tone offset or tone onset during extinction ( Figure 5E and 6A). Video of the behavior was acquired as normal, with the addition of a 400-650 nm filter (Kentek) in front of the camera lens to prevent laser light from appearing in the video. Prior to or following fear extinction, the same mice underwent the real time place preference (RTPP) assay with the same laser intensity (Supp. Figure S4). For medial VTA tone off and lateral VTA tone on cohorts, an equal number of mice received RTPP before versus after fear conditioning and extinction. For lateral VTA tone off cohorts, all mice received RTPP after the fear conditioning and extinction assay. No explicit power analysis was used to determine sample size for fiber photometry experiments. We decided the sample size (n is approximately 10 for each group) based on what is typically seen in the literature. The sample size of approximately N of 10 per group are biological replicates, where biological replicate means an individual mouse. For two of the three optogenetic cohorts, two different rounds of mice were run on separate occasions and data was combined for the full cohort. For the third cohort, all mice were run at once. All mice had at least one fiber in the lateral VTA (from histology images) and were included in statistical analysis. No outliers were discarded. From all cohorts, 1 NpHR mouse was removed from fear extinction analysis because the optic fiber connection slipped off during extinction day 3. For the data in Figure 5G,I and 6C, a 2-factor mixed ANOVA with group (NpHR or YFP) as between-subjects factor and tone number (1 to 63) as within-subjects factor was used to determine significance. For the data in Figure S4B-E, a 2-way ANOVA with group (NpHR or YFP) and subregion (medial VTA or lateral VTA) was used to determine significance.
Correlation between GCaMP6f and freezing: No explicit power analysis was used to determine sample size for fiber photometry experiments. We decided the sample size (N is approximately 10 for each group) based on what is typically seen in the literature. No outliers were discarded. For each fiber photometry cohort, 1-4 mice were run on each occasion and all data was aggregated to create the cohorts for the paper. For trial-by-trial analysis, we calculated the Pearson correlation coefficient between GCaMP6f and freezing during extinction for each mouse. Since there were 63 total extinction trials across 3 extinction days, each mouse had 63 values for mean freezing during the tone, correlated with 63 values for mean GCaMP6f fluorescence for 5 s after tone onset or 5 s after tone offset. We next gathered the correlation coefficients from each region (medial or lateral VTA) and GCaMP6f time point (5 s after tone onset or tone offset) and used a one sample t-test to determine if they significantly differed from 0 ( Figure 3A and 3D). For across animal analysis, we used Pearson's correlation coefficient and resulting p-value to find the correlation between mean freezing per mouse with mean GCaMP6f (either 5 s after tone onset or after tone offset) per mouse, we drew the best fit line using estimates from a generalized linear regression model ( Figure 3B,C,E,F).

Histology:
To confirm the location of viral targeting and optical fiber implant, mice were anesthetized with Euthasol (.1 mL/mouse), perfused with 10 mL of phosphate buffered saline (PBS) followed by 10 mL of 4% paraformaldehyde (PFA) in PBS, their brains extracted, post-fixed with 4% PFA for 24 hours, then stored in a 30% sucrose in PBS solution for at least 24 hours before slicing. Brains were sliced coronally at 40 um, and relevant VTA slices were stained for tyrosine hydroxylase (TH), a marker for dopamine neurons (primary antibody: Chicken Anti-TH, Aves Labs; secondary antibody: Alexa Fluor 647 Donkey Anti-Chicken IgG, Jackson ImmunoResearch) and GFP, to enhance viral expression (primary antibody: GFP Recombinant Rabbit Monoclonal Antibody, Thermo Fisher Scientific; secondary antibody: Alexa Fluor 488 Donkey Anti-Rabbit IgG, Life Technologies). Slices were then mounted with Fluoromount-G with DAPI (Thermo Fisher Scientific) to determine the location of cell nuclei.

CNN :
We introduce an open-source pipeline to automate freezing analysis of behavioral videos. The main motivation and advantage of this pipeline is automation of freezing analysis, even when mice are tethered to neural headgear --our analysis does not confuse headgear movement for mouse movement. Drawbacks to this technique include necessity for manual scoring to train the network (it took approximately seven hours to score the 33,000 images we used to train each network), and different networks must be trained for different backgrounds and neural headgear. Additionally, having a graphics processing unit (GPU) greatly speeds up network training (1-2 hrs to train 200 epochs on a 320 nVidia P100 GPU vs. 1 week on a 32 GB RAM, Intel Core i7 CPU) and using the network for analysis (10 min to run 25,000 frames on a 320 nVidia P100 GPU vs. 1 hr on a 32 GB RAM CPU).
Training and using the network involved: (1) Human labeling of freezing behavior, (2) Generating images for CNN input, (3) Training the CNN, (4) Assessing CNN accuracy and precision, and (5) Using network for analysis. Custom MATLAB software was developed for parts (1)-(2), and custom Python software was developed for parts (3)-(5). Code for all steps will be shared via GitHub upon publication.
(1) Human Labeling of Freezing behavior: First, behavioral videos were broken down into individual frames. Next, random pairs of consecutive frames are selected for hand scoring as "1" ("freeze" -no mouse movement occurred between frames) or "0" ("no freeze" -mouse movement occurred between frames). Scored images were saved and processed in step (2). To ensure that all possible mouse movements are represented, we recommend using behavioral videos from random time points of at least 7 to 20 mice for each context. In particular, the fear conditioning context requires 15-20 mice due to low variability of movements within mice resulting from highly prevalent freezing. Mice in the fear extinction context are more apt to moving around, thus 5-7 mice are sufficient to capture this variability. Each network needs 30-40k labels, or approximately 7 hours of hand scoring. The network trains best with roughly equal amounts of "freeze" and "no freeze" data.
(2) Generating images for CNN input: Pairs of consecutive frames used for hand labeling were condensed into a "difference image": . This difference image reflects the threshold-normalized absolute value of the pixel intensity D i,j difference between consecutive images. Where: is the difference image D i,j where is the number of pixels along each axis of the image , , , ..N i j ∋ 1 2 .
N where is an individual frame from timepoint in the video P abs( P ) Δ = i,j,t − P i,j,(t+1) P t is the mean function μ is the standard deviation function σ is the threshold value, the same threshold is used for all calculations. In our case, we used θ D i,j set as approximately the top .05% percentile of 5, θ = 1 P . Δ is a scale factor that ensures the pixel value is at a maximum of 255 (for grayscale images, (3) Training the CNN: Next, the difference images, along with their respective hand labels, were split across mice into train and test sets, where one mouse's images were either in the training or test set, but not both. This was analogous to K-fold cross validation, but where the data are partitioned by mouse rather than randomly . This ensured that similarity of behavior within a mouse did not confound accuracy of the test dataset.
The training set was then used to train a CNN (ResNet18 (He et al. 2016) , pretrained on ImageNet, batch size = 128, learning rate = 0.0001, Adam optimization (Kingma and Ba 2014) , random rotation and flip) over the course of 200 epochs; the CNN weight matrix of each epoch was saved for analysis and network selection in the next section. The CNN aimed to minimize cross entropy loss (a form of negative log likelihood) using the PyTorch module (Paszke et al. 2019) .
(3) Assessing CNN Accuracy and Precision. To ascertain which CNN epoch to use as the final classifier, we examine its test loss, accuracy, false positive rate and false negative rate. We accept epochs when test loss plateaus, accuracy is above 90%, and false positive rate (FPR) and false negative rate (FNR) are below or around 10%. The exact epoch can vary with each network, but it will be around 20-40 epochs. The CNN weight matrix from the appropriate epoch is then chosen as the classifier for behavioral analysis. Below are the methods used for calculating false positives, false negatives, true positives, true negatives, false positive rate and false negative rate: False positives (FP): Number of frames that were classified as "freeze", but labeled "no freeze" False negative (FN): Number of frames that were classified as "no freeze", but labeled "freeze" True positives (TP): Number of frames that were classified as "freeze" and labeled "freeze" True negatives (TN): Number of frames that were classified as "no freeze" and labeled "no freeze" False positive rate (FPR): F P F P + T N False negative rate (FNR): F N F N + T P (4) Using Network to Classify Freezing: All behavioral videos are broken down into individual frames, which are then formatted into "difference images". These difference images are fed as input into the CNN, and the resulting output is either "1" or "0", representing a label of "freeze" or "no freeze". Depending on the video sampling rate, further post-processing can take the average freezing of all frames for each second of the video, to obtain a "per second freezing percentage." In our particular videos, we use a 11.23 Hz frame rate.
We trained separate networks for each of our four experimental contexts (fear conditioning vs fear extinction; 1-fiber photometry vs. 2-fiber optogenetics), and use the above steps (1)-(4) to identify the frames in which mice were freezing during our behavioral videos.