Discrete and coordinated encoding of punishment contingent on rewarded actions by prefrontal cortex and VTA

Actions motivated by a rewarding outcome are often associated with a risk of punishment. Little is known about the neural representation of punishment that is contingent on reward-guided behavior. We modeled this circumstance by using a task where actions were consistently rewarded but probabilistically punished. Spike activity and local field potentials were recorded during this task simultaneously from VTA and mPFC, two reciprocally connected regions implicated in both reward-seeking and aversive behavioral states. At the single unit level, we found that ensembles of VTA and mPFC neurons encode the contingency between action and punishment. At the network level, we found that coherent theta oscillations synchronize the VTA and mPFC in a bottom-up direction, effectively phase-modulating the neuronal spike activity in the two regions during punishment-free actions. This synchrony declined as a function of punishment contingency, suggesting that during reward-seeking actions, risk of punishment diminishes VTA-driven neural synchrony between the two regions.


Introduction 24
Goal-directed actions aimed at obtaining a reward often involve exposure to an aversive event 25 or punishment. For example, foraging for food in the wild may result in encountering a predator. 26 In a causally and socially complex world, appropriate representation of punishment that is 27 contingent on reward-seeking actions is critical for survival and optimal action selection. Deficits 28 in this representation may be associated with detrimental behavioral patterns observed in 29 addictive disorders while exaggerated representation of punishment may be linked to anxiety- we further hypothesized that the interaction between VTA and mPFC provides a dynamic 52 representation of punishment contingent on a reward-seeking action and punishment-based 53 modulation of that action. 54 To test these hypotheses, we first designed and validated a task that allowed us to 55 assess reward-directed instrumental behavior in the absence or presence of action-contingent 56 punishment in the same recording session. The latter criterion was critical because it allowed us 57 to track the activity of the same ensembles of neurons as the punishment contingency 58 associated with the same instrumental behavior changed. The task was designed so that the 59 action always procured a reward but the same action probabilistically led to punishment with 60 block-wise varying degrees of contingency. Thus, different blocks had varying action-61 punishment contingency whereas the action-reward contingency remained constant. We then 62 recorded single unit activity and local field potentials (LFP) from the VTA and mPFC 63 simultaneously during this task. The simultaneous recording allowed us to characterize and 64 7 distinct time-varying patterns in encoding of punishment by different unit groups. We then 124 examined the time course of individual neuronal encoding in the peri-action epoch. We 125 recalculated the peri-action ωPEV using a narrower moving window (50-ms width, 5-ms step) to 126 reveal time points of individual neuronal encoding of punishment with a higher degree of 127 temporal precision. VTA DA neuronal encoding appeared to be concentrated specifically around    Figure 6g-h). Theta oscillations appeared to be coherent between the two regions during 176 the punishment-free action, but the coherence significantly reduced as a function of punishment 177 ( Figure 6i). A significant interaction between punishment and frequency band was observed in 178 LFP coherence during the pre-action period (Repeated measures ANOVA, post-cue, F 52, 1092 = 179 0.93, p = 0.62; pre-action, F 52, 1092 = 2.45, p < 0.001). 180 To examine mutual influences (directionality) of LFP time series between the two 181 regions, we quantified Granger causal influences (GC) in VTA-to-mPFC and mPFC-to-VTA 182 directions (Experimental procedure). During punishment-free action in block 1, the theta 183 oscillation was driven by VTA, as mPFC was GC influenced by VTA significantly greater than 184 the GC influence of the other direction ( Figure 6j). A significant interaction between directionality 185 and frequency band was observed in GC coefficients in all blocks, indicating that the oscillation 186 directionality varied across frequency bands (Figure 6j But importantly post hoc analysis revealed significantly greater VTA-to-mPFC directionality in 189 frequency bands including the theta band in block 1, and the directionality became unclear in 190 blocks 2 & 3 ( Figure 6j, data not shown for block 2). Taken together, these results suggest that 191 the VTA-driven theta oscillation entrains the VTA-mPFC circuit during punishment-free action. 192 Decline in this entrainment may represent punishment contingent on the action, since power, 193 coherence, and directionality of the theta oscillation declined as a function of punishment 194 contingency. Analyses of no-shock control data revealed that these punishment-dependent 195 changes in the VTA-mPFC theta oscillations were not evident in the absence of punishment 196 ( Figure S4). 197

Punishment-induced reduction in local and interregional LFP-spike synchrony 198
Synchronous oscillations can provide temporal coordination of spike activity of local and values < 0.005). These indicated an entrainment of spike timing by preceding cycles of the 221 oscillation -i.e., directionality from the theta oscillation to the spike activity. To examine the 222 modulation of LFP-spike phase-locking by punishment, we compared PLVs across different 223 blocks. A trend toward reduction in PLV was found in block 3 compared with block 1 in mPFC 224 ( Figure 7c; Signed-rank test, p = 0.077), and a significant reduction was found in VTA ( Figure  225 7f; p = 0.006). Likewise, a trend toward reduction in the proportion of phase-locked units was 226 observed in block 3 compared with block 1 in mPFC (Figure 7b; Chi-square test, χ 2 1 = 3.25, p = 227 0.071), and a significant reduction was found in VTA ( Figure 7e; χ 2 1 = 4.31, p = 0.038). We next 228 examined VTA DA and non-DA neuronal phase-locking separately. Greater fraction of DA units 229 (45 %) appeared to be phase-locked compared with non-DA units (23 %) in block 1 (Figure 7g; 230 Chi-square test, χ 2 1 = 5.04, p = 0.025). The DA neuronal PLV in block1 significantly declined as 231 a function of punishment contingency in block 2 & 3 ( Figure 7h; Signed-rank test, p values < 232 0.01), whereas the non-DA neuronal PLV did not differ across blocks (p values > 0.43). These 233 indicated that the punishment-induced reduction in the VTA neuronal phase-locking was 234 predominately due to the reduction in DA neuronal synchrony. 235 Next we examined the interregional LFP-spike phase-locking between VTA and mPFC. 236 Based on the Granger causal influence indicating VTA-to-mPFC directionality in theta 237 oscillations, we anticipated stronger mPFC neuronal synchrony to the VTA theta oscillation than 238 that of the other direction. Consistent with this, we found that a substantial proportion of mPFC 239 units (31 %) were phase-locked to the VTA theta oscillation in block 1. A representative mPFC 240 unit with significant phase-locking is shown in Figure 8a-b. The interregional spike-phase 241 synchrony emerged during the action compared to the baseline (Figure 8c, Signed-rank test, p 242 values < 0.001). We examined directionality of the LFP-spike synchrony using the time-lagged 243 phase-locking analysis. In block 1, the majority of phase-locked units had their peak PLVs with a 244 negative lag (Figure 8d; Signed-rank test, p = 0.066). Likewise, greater proportions of phase-245 locked units were observed on negative lags (Figure 8d, bottom). In addition, the mean PLV 246 across negative time lags appeared to be greater than that of the positive lags (Figure 8e; 247 Signed-rank test, p = 0.023). These indicate mPFC neuronal entrainment to preceding VTA 248 theta oscillatory cycles -i.e., VTA-to-mPFC directionality. When compared across blocks, the 249 mPFC neuronal entrainment by the VTA theta oscillation declined as a function of punishment 250 contingency (Figure 8e-g, Signed-rank test, p = 0.003). As the degree of phase-locking 251 diminished, the VTA-to-mPFC directionality also declined (Figure 8d-e). We also examined the 252 VTA neuronal phase-locking to the mPFC theta oscillation. The degree of VTA neuronal phase-253 locking to the mPFC theta oscillation appeared to be much weaker than the mPFC neuronal 254 phase-locking to the VTA theta oscillation (Wilcoxon Rank-sum test, p < 0.001), corroborating 255 the VTA-to-mPFC directionality in the theta-oscillation-mediated spike phase modulation ( Figure  256 S5). The PLVs did not differ across different blocks in both DA and non-DA units ( Figure S5). 257 Analyses of no-shock control data showed unchanging neural synchrony across blocks in the 258 absence of punishment ( Figure S6-7). 259 In sum, we found a coherent theta oscillation temporarily synchronized the VTA-mPFC 260 neural circuit during rewarded instrumental action. This synchrony declined as a function of 261 punishment contingency (Figure 8h). 262

Discussion 263
To unravel the VTA and mPFC neural representation of punishment contingent on goal-directed 264 behavior, we engaged animals in an instrumental task where an action consistently procured a 265 reward but probabilistically led to punishment. Simultaneous recording from VTA and mPFC 266 demonstrated that these regions use multiple coding structures, involving spike-rate and LFP-267 mediated neural synchrony, to represent punishment contingent on goal-directed actions. VTA 268 and mPFC single neurons encoded the same action differently if that action was punishment-269 free versus punishment-prone, suggesting that these neurons encode the contingency between  Importantly, the DA neuronal encoding of punishment was concentrated around the time of the 289 action compared with other task epochs, suggesting that DA neuronal signaling of punishment 290 may primarily reflect the action-punishment contingency. On longer timescales, punishment can 291 elicit persistent changes in motivational and emotional states -e.g., changes in mood. We 292 found that mPFC and VTA non-DA neurons display temporally diffuse encoding of punishment 293 within the peri-action window. Likewise, many of the mPFC and non-DA neurons showed 294 significant modulation of their baseline firing rates, suggesting that these neurons may encode 295 punishment with persistent changes in activity. This sustained change in activity may be 296 responsible for longer-lasting affective impact of punishment. In contrast, a greater proportion of DA neurons showed excitatory encoding of 315 punishment; i.e., they treated appetitive and aversive components in the same direction, and 316 responded to actions prone to punishment with further excitation. We observed that excitatory 317 encoding of punishment contingency was more predominant among DA neurons, suggesting 318 that the contingency of punishment is not simply encoded as reduced value of the action.

An instrumental task with varying punishment-action contingency 427
After the postsurgical recovery, rats were kept at 85 % of their free-feeding weight on a 428 restricted diet of 13 g food pellets a day with free access to water. In an operant chamber, rats 429 were fully trained to make an instrumental nose poke to the cue port to receive a sugar pellet at 430 the food trough located in the opposite side of the chamber on the fixed ratio schedule of one -431 i.e., FR1 (Figure 1a-b). After completion of three FR1 sessions consisting of 150 trials in 60 432 mins, rats were trained with the task consisting of three blocks with varying degrees of action-433 punishment contingency (50 trials per block). Each block was assigned an action-punishment 434 contingency of 0, 0.06, or 0.1 -i.e., the conditional probability of receiving an electrical foot 435 shock (0.3 mA, 300 ms) given an action. The action-reward contingency was kept at 1 across 436 all training and recording sessions; i.e., every nose poke procured a reward even in the shock 437 trials. To minimize generalization of the action-punishment contingency across blocks, they 438 were organized in an ascending shock probability order -Block1: 0, Block2: 0.06, Block3: 0.1, 439 interleaved with 2-min timeout between blocks. In block 2 and 3 of each session, 3 and 5 trials 440 were pseudo-randomly selected and followed by an electrical foot shock. No explicit cue was 441 provided on shock trials to keep the shock occurrence unpredictable. The cue onset only 442 signaled initiation of a trial. Animals were informed of the block shift by the 2-min darkened 443 timeout in between blocks. In addition, the first shock trial of block 2 and the first two shock trials 444 of block 3 were randomly selected from the initial 5 trials of each block. Also, animals completed 445 two sessions of this task before the recording session, thus the shock occurrence and the task 446 design including the ascending punishment contingency were not novel to them at the time of 447 the recording session. All training and recording sessions were terminated if not completed in 448 180 mins, and data from the completed sessions only were analyzed. Animals displayed stable 449 behavioral performance overall without any sign of contextual fear conditioning, since they 450 performed fearless in the safe block across all sessions. In addition, there was no evidence for 451 habituation to the shock as they showed equivalent punishment-based behavioral changes 452 across sessions. For the diazepam pretreatment experiment, a separate group of rats (N = 9) 453 were trained using abovementioned procedure, and they underwent three test sessions with 454 intraperitoneal pretreatment of saline -diazepam (2 mg/kg, Hospira, Inc.) -saline. Injected 455 animals were returned to their home cage for 10 minutes before they were placed in the operant 456 chamber. Three days of washout period was allowed between sessions. 457 458 Electrophysiology 459 Single-unit activity and local field potentials (LFPs) were recorded simultaneously using a pair of 460 eight channel Teflon-insulated stainless steel 50 µm microwire arrays (NB Laboratories). Unity-461 gain junction field effect transistor headstages were attached to a headstage cable and a 462 motorized commutator nonrestrictive to the animals' movement. Signals were amplified via a 463 multichannel amplifier (Plexon). Spikes were bandpass filtered between 220 Hz and 6 kHz, 464 amplified ×500, and digitized at 40 kHz. Single-unit activity was then digitally high-pass filtered 465 at 300 Hz and LFP were low-pass filtered at 125 Hz. Continuous single-unit and LFP signals 466 were stored for offline analysis. Single units were sorted using the Offline Sorter software 467 (Plexon). Only the single-units with a stable waveform throughout the recording session were 468 further analyzed. If a unit presented a peak of activity at the time of the reference unit's firing in 469 the cross-correlogram, only either of the two was further analyzed. 470 471

Neural data analysis 472
Single unit and LFP data analyses were conducted with Matlab (MathWorks) and SPSS 473 statistical software (IBM). For single unit data analyses, 1-ms binned spike count matrix of the 474 peri-cue, action, and reward periods (starting 2 s before each event and ending 2 s after each 475 event) were produced per unit. The baseline period was a 2-s time window beginning 2.5 s 476 before the cue onset. For all neural data analyses, the trials with shock delivery (three and five 477 trials for block 2 and 3, respectively) were excluded as single-unit and LFP signals in these trials 478 were affected by electrical artifacts during shock delivery. 479 Trial-averaged firing-rate analysis. Spike count matrices were further binned using a 200 ms 480 rectangular moving window with steps of 50 ms within the -2 to 2 s epoch aligned to the task 481 event occurring at time = 0 for the firing rate analysis. Binned spike counts were transformed to 482 firing rates and averaged across trials. The trial-averaged firing rate of each unit was Z-score 483 normalized using the mean and standard deviation of its baseline firing rate. Bivariate Granger causality analysis. To examine mutual influences (directionality) between LFP 553 oscillations in the two regions, we quantified Granger causality between the simultaneously 554 recorded peri-action LFP traces (-2 to 2 s around the action occurring at time = 0). The bivariate 555 Granger causality (G-causality) infers causality between two time series data based on temporal 556 precedence and predictability (Barnett and Seth, 2014; Granger, 1969). That is, a variable X1 557 'Granger causes' a variable X2 if information in the past of X1 helps predict the future of X2 with 558 better accuracy than is possible when considering only information already in the past of X2 559 itself. In this framework, two time series X1(t) and X2(t) recorded from mPFC and VTA can be 560 The spectral G-causality from 1 to 2 is then obtained by: 575 The spectral G-causality measure lacks known statistical distribution, thus a random 577 permutation method was used to generate a surrogate distribution, by which the upper bound of We repeated the analysis with different time lags and analysis windows, and confirmed that the 606 results were very similar across different parameters. 607 608

Statistical analysis 609
Parametric statistical tests were used for z-score normalized data and non-normalized data that 610 are conventionally tested using a parametric test. Nonparametric approaches, such as 611 conventional nonparametric tests or bootstrapping were used for a hypothesis test of data, of 612 which statistical distribution is unknown, e.g. phase-locking values (PLVs). For all tests, the 613 Greenhouse-Geisser correction was applied as necessary due to violations of sphericity. All 614 statistical tests were specified as two-sided. Multiple testing correction was applied for all tests 615 including multiple comparisons using the Bonferroni correction.                  Phase-locking of VTA spikes to mPFC theta oscillation