Multiple signals in anterior cingulate cortex

Highlights • There are multiple signals in anterior cingulate cortex (ACC).• ACC activity reflects value of behavioural change even after controlling for difficulty.• ACC activity reflects updating of internal models even after controlling for difficulty.

When humans and other animals take a course of action they usually do so because they believe the benefits of doing so will outweigh the costs. There is an evolving understanding of the mechanisms underlying evaluation of one well-defined choice against another that have been linked to ventromedial, orbital prefrontal, and intraparietal sulcal cortex [1,2 ,3,4 ,5]. There are also, however, times when animals decide whether it is worth acting at all or evaluate whether it is worth continuing to engage in the current behaviour or to explore alternatives. This distinct pattern of decision-making is linked to ACC; ACC manipulations affect the ability of animals to initiate any action at all [6], weigh up the costs and benefits of actions [7,8 ], switch between actions as their values change [9,10 ], or explore alternative choices [11 ]. A series of recent studies have demonstrated the presence of activity changes in ACC that correspond to the types of signals that would be needed to guide such behaviour; these signals encode the values of actions [7,12 ,13,14,15,16 ,17 ,18 ], the average value of alternative courses of action in the environment ('search value') as opposed to the current or default course of action [19][20][21], exploration and evaluation of hypotheses about the best course of action to take [22,23 ,24], and reflect updating of decisionmakers' beliefs and internal models of their environments [25,26]. Not only are such signals found in ACC but they are weak or absent in regions such as orbitofrontal and ventromedial prefrontal cortex that carry other value signals [12,19,20,22].
In addition, however, ACC has also been linked to 'conflict monitoring' -the process of detecting when two competing choices might be made during a difficult task [27]. Detecting response conflict and task difficulty is important if mistakes are to be averted. Recently it has been argued that ACC activity interpreted as reflecting value signals has been confounded with difficulty and so it has been argued that such ACC activity is more parsimoniously interpreted as simply reflecting task difficulty [28]. Here we review evidence, first, that value signals and, second, model update signals can be separated from any effect difficulty exerts on ACC activity.
For example, a recent study [19] investigated how people decide whether to explore a set of alternative choices or stick with the opportunity to make a 'default' choice. The value of exploring was encoded by a 'search value' signal in ACC indexing the average value of the set of alternative choices that might be taken. In addition to search value, ACC activity was also influenced, in a negative fashion, by engage value (the value of the default option) and costs incurred by searching. This pattern of positive and negative modulations is suggestive of a comparison process taking place within ACC that could inform decisions about whether or not to explore, or 'forage' amongst, the alternatives. Figure 1a, however, summarizes how difficulty might be confounded with the difference between search and engage value -a quantity sometimes referred to as the 'relative value of foraging' or RVF [28]. The probability of behavioural change -searching as opposed to 'engaging' with the current default -is plotted on the ordinate as a function of RVF. A confound between RVF and difficulty arises if subjects are biased to take the default. Even if the experiment examines decisions equally on either side of the objective indifference point -the point at which searching and engaging objectively have the same value -it is still possible that the sampling is unequal with respect to the subjective or empirical indifference point -the point at which a given participant has no preference between the options. The confound arises because decisions close to the subjective indifference point are the most difficult to take [for example, they are associated with long reaction times (RTs)]. If participants are very biased to nearly always take the default option then RVF and difficulty both increase together across much of the decision space.
Experiments addressing this criticism must contain certain obvious features. First, a broad and evenly distributed range of search and engage value must be tested. However, at the same time, it is crucial that decisions are not trivially easy and that some value comparison occurs on each trial. Second, it is imperative that participants make decisions that really are guided by option values and do not always simply engage with the default option. One way of ensuring this is simply to provide adequate task training and

Current Opinion in Neurobiology
Search value has an early and sustained effect on ACC activity, engage value impacts on ACC slightly later, and difficulty effects occur even later in the trial. (a) General linear model (GLM) timecourse analysis ACC activity demonstrates effects of both search value (red) and engage value (blue). Note that RVF is a combination of search value and engage value. The results remain the same regardless of whether the regression included all the data from [19] and regressors indexing the cost of taking a foraging choice, difficulty, and/or logRT. They remain the same even if, to further guard against any possibility of a confound, the analysis focused on the data that best discriminates between search value and difficulty. This can be achieved by focusing on a subset of the data. To ensure no correlation between RVF or search value and difficulty or log(RT) the easiest engage trials where p(forage) < 0.02 (lower panel) can be removed. The numbers of samples included are shown in blue in the lower panel while the excluded trials are shown in red. Forage frequency in the remaining trials is shown in the upper panel. The effect of log(RT) (c) and difficulty (d) appear late in the trial. Statistical significance of signals can be assessed by convolving the time-course of their beta-weights with a hemodynamic function (m = 6 and s = 3; to average the beta-weights of each contrast and every person separately). Search value had a significant effect on ACC ( p 0.001 in all cases). Difficulty had little impact on ACC activity as estimated using a standard hemodynamic function time-locked to the start of the trial or response cue onset, but the effect of difficulty and RT increased later in the trial. (e) HRF convolved average BOLD signal in ACC binned according to different parameters. When ACC activity is examined late in the trial period it can be seen that it increases with search value (e, i), difficulty (e, ii) and RVF (e, iii). When the same analysis is conducted earlier in the trial then only search value and RVF effects are apparent. All bins are equally sized for every participant and included at least 32 trials. Error bars are the standard error of the individual effects for each bin. Adapted from [19].
instruction prior to scanning. If this is done then subjects make rational value-guided decisions and therefore subjective and objective indifference points are close ( Figure 1c) and difficulty/value confounds disappear; many decisions are examined in decision space to the right of the subjective indifference point where RVF and difficulty are not positively correlated. Third, when analysing the data, rather than examining the neural correlates of the aggregate of decision variables -RVF -it is advisable to focus on the component values that determine RVF: search value, engage value, and costs. These component values are more easily dissociated from difficulty. Employing these principles Kolling and colleagues [19] reduced the shared variance between search value, engage value, and difficulty to 2% so that the neural correlates of each could be separately identified (Figure 2). Now it is clear that ACC activity reflects search value shortly followed by engage value although towards the end of the decision period some variance in ACC activity is accounted for by difficulty and RT. Parallels can be drawn with recordings made in other brain areas concerned with value-guided decision making such as the intraparietal sulcus [29]; initially activity in intraparietal neurons reflects saccade value but then it transitions to reflect action related factors.
Such a pattern of results suggests ACC is a neural network in which decisions to explore or not are taken; activity is affected by a search value signal (apparent throughout much of the trial period) but that the network takes longer to make decisions using this signal and others when they are difficult [30] (and therefore some variance in dACC activity at the end of the decision period is accounted for by difficulty and RT). Biophysically plausible neural networks have been proposed [31] in which pools of neurons are active in proportion to the evidence favouring particular choices. If the representation of search value in ACC takes this form then the network activity should reflect both search value and difficulty. In fact, the prediction is that the impact of search value should scale with difficulty. Although it might be difficult to assess such a precise hypothesis with fMRI such considerations suggest that conducting experiments with decisions involving extremely high search values may be unwise [28]; when decision difficulty is very low the network may resolve the decision and enter an attractor state so quickly that it will be difficult to see any effect of search value. In other words, exclusive sensitivity to search value, and not difficulty too, is not a prediction for a search value sensitive decision circuit but instead sensitivity to both search value and difficulty is expected. In the future, careful neurophysiological measurements will be essential for testing such potential mechanisms at the neuronal level, disentangling how aggregate measures such as the BOLD signal are derived from actual neural network operations.
Furthermore, ACC is sometimes co-activated with adjacent medial frontal brain areas [26] and so an important consideration when drawing conclusions about ACC is to ensure that neural activity that is recorded really is drawn from ACC rather than adjacent medial frontal areas. After controlling for difficulty, search value effects are most prominent in ACC itself (Figure 3a) but task difficulty effects lie in more dorsal areas in or anterior to the presupplementary motor area (Figure 3b).
Humans and other animals should change from the behaviour they are currently engaged in and explore alternative courses of action not just when they have a sense of the value of those alternatives but also when they realise the environment is changing. ACC activity is also prominent when events suggest that a decision-maker's internal model of their environment should be updated [11 ,25,26]. By definition, surprising events are ones that were not predicted by the decision-maker's current model of their environment.   made saccades to targets (coloured dots) that, on each trial, appeared on a circular perimeter surrounding a fixation point. The dots' locations were usually predictable because they were similar over runs of 10-20 trials but two types of unexpected event occurred. On model update trials (Figure 4a) the dot appeared in an unexpected location and its new colour indicated that future dots were likely to appear nearby on the circle's periphery. However, on surprise only trials (Figure 4b), dots appearing in white in a surprising location indicated one-off events and no need for participants to update their internal model of where future dots would appear. The difficulty of responding on any trial reflects the surprise associated with a particular stimulus value, a, and is characterized in Information Theory by its Shannon information I S (a): where p(ajprior) is the prior probability that the observation a would be made, given the brain's internal model just before the data point was observed. Therefore the Shannon information captures how unexpected or unlikely a particular observation is, given the internal model and is directly related to the difficulty of the trial. In contrast, updating of the internal model is captured by the Kullback-Leibler divergence (D KL ) between the posterior and the prior: where p(ajprior) is the probability that the observation a would be made, given the model just before a was observed, and p(ajpost) is the same quantity, given the updated model just after a was observed. D KL is the probability-weighted average change in Shannon information across all possible stimuli as a consequence of updating the model.    response selection difficulty as indexed by RT, ACC was not (Figure 4h). Other studies similarly suggest ACC is activated when there is a need to update the task model even in the absence of any response selection difficulty (because no response is required at all) [32]. Model updating-related activity in ACC is, therefore, linked to behavioural flexibility and change and not simply response selection difficulty. This role of the ACC may underlie its activation during proactive control and error correction. It is possible that ACC activity in other experiments may have a similar role [21,33 ].
In summary, ACC carries multiple signals. ACC activity reflects both search value and the updating of internal models of the environment. In both cases, and in other reports [20,34,35], ACC is linked to behavioural change, invigoration of new responses, novel response strategies, and exploration. We have conceptualized search value as the average value of choices that might be taken in an environment but it could take many other forms depending on context. We and others have argued that some of these signals may have arisen in the context of the foraging choices that animals make as they decide to leave one foraging patch to explore another [15,19,20,36,37]. Advantages of this approach are that it situates ACC function within the context of a behaviour for which there has been substantial evolutionary pressure and it suggests ways of optimal modelling of both behaviour and neural activity. Similar processes are likely to underlie human behaviours such as task switching. Such a perspective holds great promise for making novel predictions about behaviour and neural mechanisms in a principled fashion.
Two regions within ACC, dorsal ACC (dACC) and perigenual ACC (pgACC) [19,20,38 ,39 ], carry related signals. Both areas are found in humans and macaques; each area has a distinctive pattern of interaction with wider brain circuits that is similar across species [40,41 ]. Similar areas are also present in rodents and again they mediate related aspects of behaviour [8 ,11 ,25,42]. Indeed, when a decisionmaker has updated its internal model or is about to pursue an alternative course of action then it may be necessary to exert careful control over which actions are selected next. However, the same is true even when one manages to resist the attractions of an alternative course of action [43] or when attention has lapsed or errors have been made. In all these situations it is necessary to exert greater cognitive control and this may be brought about by interactions between ACC and lateral prefrontal cortex [16 ,23 ,24,44 ,45,46,47,48,49 ].

Conflict of interest
We have no conflicts of interest. This study provides evidence that the activity of single neurons in ventromedial prefrontal cortex accords with a previously proposed neural network model of decision making. The neural dynamics observed are those that would be expected from a selection mechanism that is based on mutual inhibition of option specific pools of neurons that compete as a function of the value input they receive.

8.
Friedman A, Homma D, Gibb LG, Amemori K, Rubin SJ, Hood AS, Riad MH, Graybiel AM: A corticostriatal path targeting striosomes controls decision-making under conflict. Cell 2015, 161:1320-1333. An extensive and elegantly conducted series of investigations into the interactions of the rodent prelimbic cortex (a region resembling pgACC in primates) and the striosome compartment of the striatum. Using electrophysiology, optogenetics, and stimulation, the authors isolate a corticostriatal pathway which has a causal role in changing costbenefit trade-off decision making. They explain the circuitry of the pathway and the role played in it by local inhibition during cost-benefit decision making. This study suggests a noradrenergic control of locus coeruleus onto ACC that inhibits reward history or internal model-based influences on behaviour and initiates truly stochastic exploration. The study highlights an important distinction as it separates at least two different forms of exploration, i.e. model-based value driven exploration, investigated in a series of other studies implicating ACC, from the stochastic or random exploration they investigate and which requires ACC disengagement.
40. Neubert FX, Mars RB, Sallet J, Rushworth MF: Connectivity reveals relationship of brain areas for reward-guided learning and decision making in human and monkey frontal cortex. Proc Natl Acad Sci U S A 2015.

41.
Procyk E, Wilson CR, Stoll FM, Faraut MC, Petrides M, Amiez C: Midcingulate motor map and feedback detection: converging data from humans and monkeys. Cereb Cortex 2014. Human fMRI experiments and monkey electrophysiological data metaanalyses are used to argue that reward feedback-related activity occurs in a precisely localizable and homologous ACC region in both species. They argue the ACC regions in both species have premotor-like functions, and are important when subjects are exploring how best to respond when a task is changing.