Anterior cingulate cortex provides the neural substrates for feedback-driven iteration of decision and value representation

Adjusting decision-making under uncertain and dynamic situations is the hallmark of intelligence. It requires a system capable of converting feedback information to renew the internal value. The anterior cingulate cortex (ACC) involves in error and reward events that prompt switching or maintenance of current decision strategies. However, it is unclear whether and how the changes of stimulus-action mapping during behavioral adaptation are encoded, nor how such computation drives decision adaptation. Here, we tracked ACC activity in male mice performing go/no-go auditory discrimination tasks with manipulated stimulus-reward contingencies. Individual ACC neurons integrate the outcome information to the value representation in the next-run trials. Dynamic recruitment of them determines the learning rate of error-guided value iteration and decision adaptation, forming a non-linear feedback-driven updating system to secure the appropriate decision switch. Optogenetically suppressing ACC significantly slowed down feedback-driven decision switching without interfering with the execution of the established strategy.

Mean population response to go and no-go stimulus in four task sessions (n = 493 neurons from 5 mice).The symbol * denotes a significant difference between go and no-go responses in the same session (two-sided Wilcoxon signed-rank test).The symbol # denotes a significant difference in go response between the Stable session and the other sessions; the symbol $ represents a significant difference in no-go response between the Stable session and the other sessions (two-sided Friedman test with post-hoc Bonferroni comparisons, comparing each session with the Stable session).c The proportion of ACC neurons with significant activated and suppressed responses during the stimulus window in four task sessions (n = 493 neurons from 5 mice).The symbol * denotes a significant difference between go and no-go responses in the same session (two-sided chi-square test).
The symbol # denotes a significant difference in go response between the Stable and the other sessions, the symbol $ represents a significant difference in no-go response between the Stable and the other sessions (two-sided chi-square test with post-hoc Bonferroni comparisons, comparing each session to the Stable session).**P < 10 -2 , ***P < 10 -3 , ### P < 10 -3 , $$ P < 10 -2 , $$$ P < 10 -3 .In box plot: center line, median; box limits, upper and lower quartiles; whiskers, 1.5 × interquartile range.Statistical details are presented in Supplementary Table 1.Source data are provided as a Source data file.Supplementary Fig. 3  for the first time (two-sided unpaired t-test).**P < 10 -2 .Data are presented as mean ± s.e.m.

Statistical details are presented in
GCaMP6f in ACC neurons.a Task performance (discrimination index, d′) of individual mice during the initial training and reversal learning process.The background color indicates the reversal of the stimulus-reward contingencies.n = 6 mice in the initial training and n = 5 mice in the reversal training.One mouse was removed when switching to reversal training due to cranial window infection.b-c Licking performance of an example mouse on different days of training sessions (indicated by blue circles in a).Trials are sorted into the go (blue shading) and no-go (red shading) trials.Dark dots: anticipatory lick (licking during stimulus window); gray dots: consumption lick (licking after reward or air-puff delivery).Plus and cross symbols represent reward and air-puff delivery in Hit and FA trials, respectively.d The scheme of virus-mediated expression of GCaMP6f in excitatory neurons in the ACC (coronal view).e ACC boundary is identified with retrograde labeling from the retrosplenial cortex (RSP) with AAV2/R-mCherry (left).A representative image of retrogradely labeled neurons in ACC (right) is shown in the coronal section.The white dashed line marks the boundary of ACC.f An example in vivo two-photon microscopic image of ACC neurons expressing GCaMP6f (left) and mCherry (middle, retrogradely labeled from RSP).Supplementary Fig. 2 Overall population response during all four task sessions.a Trial-averaged response of recorded ACC neurons during the Hit, Miss, CR (correct rejection), FA (false alarm), RO (reward omission), and UR (un-cued reward) trials in all four task sessions (n = 493 neurons from 5 mice).Each row represents one neuron.Neurons are sorted in the same order in all subgraphs.Vertical White lines indicate the onset of the stimulus and response windows.Horizontal whitespace indicates the lack of certain type of trials (the number of trials of one type of decision  2 times) in the recorded session.b ACC activity in the stimulus window is not determined by licking.a The single-trial activity of two example neurons in Hit trials.b Heat maps of trial-averaged response of all recorded neurons during Hit trials which aligned to the onset of the stimulus or aligned to the first anticipatory lick.c Population-averaged Ca 2+ traces of neurons aligned to the onset of the stimulus and the first anticipatory lick (mean ± s.e.m.).d The histogram of neural response and the first anticipatory lick latency relative to the onsets of stimulus (n = 194 trials from 6 mice, two-sided Wilcoxon rank-sum test).a-d data from the Hit trials in the Stable session.e Top: histogram showing the distribution of anticipatory licking onset time of an example mouse of Uncertain session.All Hit trials were divided into early (blue) and late (red) lick trials according to the upper and lower quartiles of the distribution.Bottom: lick raster of the same mouse, sorted by onset times of the first lick.f-h Anticipatory lick latency (f), the histogram of neural response latency (g), the anticipatory lick number (h) of early and late licking trials (n = 49/49 trial, twosided Wilcoxon rank-sum test).i Left, average population responses to go stimulus of early (blue) and late licking trials (red, mean ± s.e.m.).Right: mean population response of different trials (n = 549 neurons from 6 mice, two-sided Wilcoxon signed-rank test).j-n Similar to e-i, but the analysis for the FA trials of Reversal session.o Schematic of the GLM model.Inset: predicted and actual ΔF/F signal for an example neuron.p Left: relative contribution of each variable (n = 549 neurons from 6 mice, two-sided Kruskal-Wallis test with post hoc Bonferroni multiple comparisons).Right: the distribution of the relative contribution of each variable.*P < 0.05, **P < 10 -2 , ***P < 10 -3 .In box plot: center line, median; box limits, upper and lower quartiles; whiskers,1.5×interquartilerange.Statistical details are presented in Supplementary Table 1.Source data are provided as a Source data file.Supplementary Fig. 4 ACC response to the go and no-go stimulus across T1-T3 phases in the Uncertain session.a Licking performance and trial outcomes of an example mouse in the Stable and Uncertain sessions.b Single-trial activity of an example neuron.c Population-averaged responses to the go stimulus (left) and sucrose reward (right) in all Hit trials (n = 549 neurons from 6 mice).Dots indicate the time segments when two traces are different (P < 0.05, two-sided Wilcoxon rank-sum test).d Left, Population-averaged responses to the go stimulus following normal trials, RO trials, and UR trials.Dots indicate the time segments when traces are different (P < 0.05, two-sided Kruskal-Wallis test with post-hoc Bonferroni comparisons).Purple dots: post normal trials vs. post RO trials; brown dots: post normal trials vs. post UR trials; black dots: post RO trials vs. post UR trials.Right, hit rate of go trials following different outcomes.e Trial-averaged responses to go stimulus in trials following unexpected outcomes and following other normal outcomes in the T1-T3 phases.f Mean response to go stimulus flowing different outcome in the T1-T3 phases (n = 549 neurons from 6 mice.Symbol *: differences between the trials following the same outcomes in different phases (two-sided Friedman test with post-hoc Bonferroni comparisons), $: differences between the trials following different outcomes in the same phase (two-sided Wilcoxon signed-rank test).g The Hit rate of trials following different outcomes.h Left, population-averaged responses to the no-go stimulus following different outcomes.Right, FA rate of no-go trials following different outcomes.i Mean population response to the no-go cue.j Correlation of response to the no-go stimulus.k Accuracy of decoding stimulus identities from neuronal activity in the T1 and T2 phases in Uncertain sessions using classifiers trained by the population activity of different trial phases.**P < 10 -2 , ***P < 10 -3 , $ P < 0.05, $$$ P < 10 -3 .Data analyzed by (d, n = 6 mice) two-sided oneway repeated measures ANOVA with post hoc Tukey's multiple comparisons, (g, n = 6 mice) two-sided two-way repeated measures ANOVA with post-hoc Bonferroni comparisons, (h, n = 6 mice) two-sided paired t-test, (i, n = 594 neurons from 6 mice) twosided Friedman test with post-hoc Bonferroni comparisons, or (k, n = 500 times repeat) two-sided Kruskal-Wallis test with post-hoc Bonferroni comparisons.Data are presented as (c, d, e, g, h) mean ± s.e.m. or (f, i, k) box plots (center line, median; box limits, upper and lower quartiles; whiskers, 1.5 × interquartile range).Statistical details are presented in Supplementary Table 1.Source data are provided as a Source data file.Supplementary Fig. 5 The anticipatory and consumption licking rates are stable throughout the Uncertain session.a.The anticipatory licking rate of the Hit trials in the Stable session and T1-T3 phases of the Uncertain session (mean ± s.e.m.).b The number of anticipatory licks (left) and lick latency (right) in each phase (n = 67~194 trials from 6 mice.Two-sided Kruskal-Wallis test with post-hoc Bonferroni comparisons).c The consumption licking rate relative to the onset of reward delivery in the Stable session and T1-T3 phases of the Uncertain session (left, mean ± s.e.m.), and the number of total consumption licks in each phase (right, n = 70~194 trials from 6 mice.Two-sided Kruskal-Wallis test with post-hoc Bonferroni comparisons).In box plot: center line, median; box limits, upper and lower quartiles; whiskers,1.5 × interquartile range.Statistical details are presented in Supplementary Table 1.Source data are provided as a Source data file.Supplementary Fig. 6 Outcome monitoring neurons detect the unexpected reward.a Trial-averaged neuronal response to reward in the Hit (left) and UR (right) trials.White vertical lines denote the time of reward delivery.Each row represents one neuron and they were sorted by their functional category, indicated by the vertical color bars to the left of the heat map (orange bar, outcome monitoring neurons identified in Fig. 3g; black bar, other neurons).b Mean population responses of all recorded neurons during 1-s window after reward delivery in the Hit and UR trials (n = 549 neurons from 6 mice, two-sided Wilcoxon signed-rank test).c Mean population responses of identified outcome monitoring neurons (marked by light orange vertical bars adjacent to heat maps in a) during 1-s window after reward delivery in the Hit and UR trials (n = 179 neurons from 6 mice, twosided Wilcoxon signed-rank test).Results suggest that outcome monitoring neurons not only monitored unexpected omission of reward (RO, Fig. 3c) but also the unexpected granting of reward.*P < 0.05, ***P < 10 -3 .In box plot: center line, median; box limits, upper and lower quartiles; whiskers, 1.5 × interquartile range.Statistical details are presented in Supplementary Table 1.Source data are provided as a Source data file.Supplementary Fig. 7 The difference of outcome evoked activity in the Hit and RO trials emerges with time in the Uncertain session.a Mean response of all neurons during 1-s window after reward omission or delivery during RO and Hit trials in the T1-T3 phases of the Uncertain session (n = 549 neurons from 6 mice.The symbol * denotes significant differences between the trials following the same outcomes in different phases (two-sided Friedman test with post-hoc Bonferroni comparisons) and the symbol $ denotes significant differences between the trials following different outcomes in the same phase (two-sided Wilcoxon signed-rank test)).b Mean responses of outcome monitoring neurons during 1-s window after reward omission or delivery during RO and Hit trials in the T1-T3 phases of the Uncertain session (n = 179 neurons from 6 mice.The symbol * denotes significant differences between the trials following the same outcomes in different phases (two-sidedFriedman test with post-hoc Bonferroni comparisons) and the symbol $ denotes significant differences between the trials following different outcomes in the same phase (two-sided Wilcoxon signed-rank test).Note that there were only 1~6 RO trials and 4~12 Hit trials in each phase.*P < 0.05, $ P < 0.05, $$$ P < 10 -3 .In box plot: center line, median; box limits, upper and lower quartiles; whiskers, 1.5 × interquartile range.Statistical details are presented in Supplementary Table1.Source data are provided as a Source data file Supplementary Fig. 8 Go cue evoked response in the ACC neurons remains low in the Reversal session.a Licking performance and trial outcomes of an example mouse in the Reversal session (left, the same example mouse in Fig. 4c), and the response of an example neuron (right).Each row represents the response of one trial and all trials are sorted by the trial type.b Left, population-averaged go stimulus-evoked responses in the trials following FA trials and following other non-FA trials (red and black traces respectively).Black dots indicate the time segments when two traces are different (P < 0.05, two-sided Wilcoxon rank-sum test).Right, the Hit rate of trials following FA and non-FA trials (red and black bars respectively.n = 5 mice, two-sided paired t-test).Noteworthy, ACC responses to go and no-go stimulus both decreased following an FA trial, possibly indicating disassociation of stimulus to the lick decision after being repeatedly punished upon licking.c Mean population response to the 12 kHz tone in the Stable session and three phases of the Reversal session (two-sided Friedman test with post-hoc Bonferroni comparisons).d Correlation of 12 kHz tone-evoked population activity between the Stable session and T1-T3 phases of the Reversal session.**P < 10 -2 , ***P < 10 -3 .Data are presented as (b) mean ± s.e.m. or (c) box plots (center line, median; box limits, upper and lower quartiles; whiskers, 1.5 × interquartile range).Statistical details are presented in Supplementary Table 1.Source data are provided as a Source data file.Supplementary Fig. 9 Value representation is consistent throughout the Re-stable session.a Licking performance and trial outcomes of an example mouse in the Re-stable session (left), and the response of an example neuron (right).Each row represents one trial and all trials are sorted by the trial type.b Mean population response to no-go stimulus in the Re-stable session (n = 493 neurons from 5 mice, two-sided Friedman test with post-hoc Bonferroni comparisons).c Left, the population-averaged no-go stimulus-evoked responses following FA (post FA) and the other (post non-FA) trials.Right, the FA rate of no-go trials following different trials (two-sided paired t-test).d The mean normalized population responses after air-puff delivery in the early (first 15% of trials) and late (last 85% of trials) periods of the Re-stable session (two-sided Wilcoxon signed-rank test).Inset, population-averaged Ca 2+ traces (mean ± s.e.m.).e The mean normalized population responses to the no-go cue in the early and late periods (two-sided Wilcoxon signed-rank test).Inset, population average Ca 2+ traces (mean ± s.e.m.).f Stimulus selectivity index (SI) of individual neurons in four task sessions (n = 493 cells from 5 mice; SI > 0: prefers to response to the go cue, SI < 0: prefers to response to the no-go cue).Symbol *: difference between SI and 0 (two-sided one-sample Wilcoxon signed-rank test), #: difference between the SI of the Stable session and the other sessions (two-sided Kruskal-Wallis test with posthoc Bonferroni comparisons, comparing with the Stable session).Color dots indices neurons with significant SI in each session (P < 0.05, permutation test).g The change of stimulus selectivity of longitudinally tracked neurons across sessions.Only neurons with significant selectivity in at least one of the two sessions were included for analysis.*P < 0.05, **P < 10 -2 , ***P < 10 -3 , ### P < 10 -3 .Data are presented as (c) mean ± s.e.m. or (d, e) box plots (center line, median; box limits, upper and lower quartiles; whiskers, 1.5 × interquartile range).Statistical details are presented in Supplementary Table 1.Source data are provided as a Source data file.Supplementary Fig. 10 ACC activity in the stimulus and response window correlated well with the Q-value and prediction error estimated by the SARSA model a Schematic diagram of the process of using a SARSA model to simulate trial-by-trial Qvalue and action sequence according to the value iteration and QP transformation function.b Licking probability and Q-value across T1-T3 phases in the Uncertain session of experiment group and simulated group.Blue shade: 95% distribution of simulated data (3,000 repeats).Mean experiment results (red line) fell within the 95% distribution of the simulated data.(experiment: n = 6 mice, stimulated: n = 3,000 repeats.)c Similar to b, except for the Reversal session.d Licking probability of experiment group and simulated group in the first 10 trials of 3 kHz stimulus in the Reversal session.The lick probability of each animal was calculated by the probability of over 5 adjacent trials.The lick probability of simulated group was calculated by the choice of over 3000 repeats.The decay of the licking probability was fit with an exponential function for each group.e Left: relationship between the model estimated Q-value and the mean population responses to 3 kHz stimulus in the first 10 trials in the Reversal session.(n = 5 mice, 10 trials/animal).Right: relationship between the model estimated perdition error ( ∆ ) and the mean population responses to the air puff in FA trials in the Reversal session (n = 5 mice, 2~5 trials/animal).The activity of dual-function neurons correlated better to the model estimated Q-value and prediction error than the rest of the population, and the slopes of linear correlations were also steeper with the activity of dual-function neurons.Data points from the one mouse were marked with the same color.The data points corresponding to the responses of dual-function neurons and the rest of the population were marked with magenta and cyan outlines, respectively.P values test the significance of the correlation.Source data are provided as a Source data file.Supplementary Fig. 11 Optogenetic inhibition of ACC delayed decision switch in the Reversal session.a Learning curve (FA rate) in the Reversal session.The thick grey and orange lines represent the mean performance values of mCherry (n = 6 mice) and eNpHR (n = 4 mice) groups, and thin lines represent each animal's performance, respectively.The FA rate was calculated by a sliding window of 5 trials.b The number of trials required for each mouse to reach switching threshold (FA < 25%, indicated by the red dashed line in a) Supplementary Fig. 12. Optogenetic activation of ACC neurons triggers licking response.a. Lick raster of mCherry (top), ChR2 (middle), and eNpHR mice (bottom).blue shade: optogenetic stimulation (488nm, 20 Hz, duration: 1 s); yellow shade: optogenetic stimulation (589 nm, duration: 1 s).b.Trial-averaged licking frequency of mCherry, ChR2, and eNpHR mice.Light green shade: optogenetic stimulation.c Licking probability of pre-, during and post-optogenetic stimulation.The ChR2 group show higher licking frequency post optogenetic stimulation, illustrating that the photoactivation of ACC triggered rebound licking (mCherry: n = 6 mice; ChR2: n = 5 mice; eNpHR: n = 4 mice.Two-sided two-way repeated measures ANOVA with post hoc Bonferroni multiple comparisons).d-e Optogenetic manipulation ACC activity during 3 kHz stimulus presentation in randomly selected 50% of go trials of the Stable session.d Raster plots of representative animals from the mCherry (left), ChR2 (middle), and eNpHR group (right) during light on and light off trials.Dark and light gray bars mark the stimulus and response window, respectively.e Lick rate during stimulus (left) and response windows (right).(mCherry: n = 6 mice; ChR2: n = 4 mice; eNpHR: n = 4 mice.Two-sided two-way repeated measures ANOVA with post hoc Bonferroni multiple comparisons).Data are presented as mean ± s.e.m.Statistical details are presented in Supplementary Supplementary Table 1.Source data are provided as a Source data file.

Table 1 .
Source data are provided as a Source data file.

Table 1 :
Statistics tableThe table shows the statistics for all main text and supplementary figures.Statistical analyses were performed with scripts written in MATLAB (2020a, MathWorks) and GraphPad Prism 9 (GraphPad).