To make reasonable and adaptive decisions, organisms have to process information on outcomes and the corresponding costs, and they must decide to maintain or end an action once engaged. A large body of evidence indicates that the anterior cingulate cortex (ACC) is one of the crucial brain regions engaged in the evaluative processes and in inhibiting responses toward less desirable but easily obtainable goals in favor of more desirable goals that may also require more physical and/or mental effort (Floresco & Ghods-Sharifi, 2007; Hauber & Sommer, 2009; Schweimer & Hauber, 2005, 2006; Schweimer, Saft, & Hauber, 2005; Walton, Bannerman, & Rushworth, 2002; Walton et al., 2009; Walton, Rudebeck, Bannerman, & Rushworth, 2007; and reviewed in Assadi, Yücel, & Pantelis, 2009; Floresco, St. Onge, Ghods-Sharifi, & Winstanley, 2008; Holroyd & Yeung, 2012; Kurniawan, Guitart-Masip, & Dolan, 2011; Rushworth & Behrens, 2008). For instance, excitotoxic lesions to the ACC have been shown to reduce the preference for the high-cost–high-reward option in the cost–benefit T-maze task in which rats could make a choice of either climbing a barrier to obtain a large reward in one arm or running into the other arm without a barrier for a small reward (Schweimer & Hauber, 2005; Schweimer, Saft, & Hauber, 2005; Walton, Bannerman, Alterescu, & Rushworth, 2003; Walton, Bannerman, & Rushworth, 2002). Similar results have been obtained in experiments using a lever-pressing task, which have shown that ACC lesions in subjects cause a significant bias away from lever-pressing on a high fixed-ratio schedule to gain a high reward (Walton et al., 2009). However, the ACC is not responsible for all types of cost–benefit decision making. Rats with ACC lesions have been shown to perform equivalently to the control groups in deferred gratification tasks (Cardinal, Pennicott, Sugathapala, Robbins, & Everitt, 2001; Rudebeck, Walton, Smyth, Bannerman, & Rushworth, 2006; and reviewed in Walton, Rudebeck, Bannerman, & Rushworth, 2007) in which the subjects choose between smaller rewards provided immediately or larger alternatives if they waited longer.

Furthermore, it has been proposed that mesocortical dopamine (DA) fibers projecting to the ACC play an important role in effort-based decision making (Berger, Gaspar, & Verney, 1991). Rats undergoing DA depletion of the ACC or the specific blockade of DA receptors in the ACC altered their response bias, leading to an obvious switch to low-effort actions in the cost–benefit T-maze task (Schweimer & Hauber, 2006; Schweimer, Saft, & Hauber, 2005). However, there are discrepancies in the roles defined for the ACC dopamine system in effort-related decision making. It has been reported that ACC DA depletion via 6-hydroxydopamine lesions does not alter choice performance of rats in the cost–benefit T-maze task (Walton, Croxson, Rushworth, & Bannerman, 2005), whereas another research group found that such lesions significantly reduced the preference for the high-cost–high-reward option (Schweimer, Saft, & Hauber, 2005).

On balance, the roles of the ACC and its dopamine system in cost–benefit decision making still cannot be defined with certainty. One reason for this could be that, to date, the majority of studies on cost–benefit decision making have been conducted using behavioral tasks that offer the subjects a limited number of competing options that involve different combinations of cost expenditure and reward earning, usually one small or nonpreferred and the other large or preferred. The cost–benefit T-maze task has been widely employed in many decision-making studies (Floresco & Ghods-Sharifi, 2007; Hauber & Sommer, 2009; Rudebeck, Walton, Smyth, Bannerman, & Rushworth, 2006; Schweimer & Hauber, 2005, 2006; Schweimer, Saft, & Hauber, 2005; Walton, Bannerman, Alterescu, & Rushworth, 2003; Walton, Bannerman, & Rushworth, 2002). Meanwhile, other researchers have used a concurrent choice lever-pressing/chow-feeding task to assess effort–reward decision making (Schweimer & Hauber, 2005; Walton et al., 2009). Behavioral variations among individual subjects and the limited options provided by these tasks make it difficult to precisely measure and assess the alterations in the responses, which could be an important factor leading to the discrepancies in previous results. In the present study, we expanded upon such behavioral designs by enabling the subjects to make unrestricted decisions about their energy or time expenditures. We trained rats using a self-paced effort-based behavioral paradigm called “do more get more” (DMGM) task or using a time–reward trade-off (TRTO) task. The only difference between these two tasks is the demand for behavioral effort. To maintain the nosepoke stance, the animals have to keep standing up in the DMGM task, which requires considerable behavioral effort. In contrast, the larger size and lower positioning of the nosepoke operandum in the TRTO task allows the subjects to effortlessly perform the nosepoke. In each trial of either task, the subjects could maintain the response for any duration that they decided and consequently obtain a water reward proportional to the nosepoke duration. In other words, the subjects decided among levels of effort or time spent to obtain different levels of reward. With such designs, we have increased the ability to assess cost–benefit decision making in a more detailed fashion. In addition, we employed a within-subjects design in each experiment.

The present study sought to examine the roles of the ACC and its dopamine system in cost–benefit decision-making tasks in which subjects could self-determine the effort/time cost on free-running trials. First, we reversibly inactivated the ACC by locally infusing the GABAa-receptor agonist muscimol into the ACC to determine whether the ACC was necessary for performing the DMGM and TRTO tasks. The results show that the DMGM, but not TRTO, task depends on an intact ACC. We then selectively blocked dopamine D1/D2 receptors in the ACC of rats to examine the potential roles of ACC dopamine receptors in DMGM decision making.

Method

Subjects

A total of 41 male Sprague-Dawley rats, eight weeks of age at the beginning of training, were used in these experiments. All animals were housed in groups of three under a constant temperature (23 ± 1 °C) and a 12-h light/dark cycle. To motivate behavior, rats had restricted access to water for 20 h before the behavioral sessions. They were allowed to access water for 30 min per day in their home cages before water deprivation and were given ad libitum access to water at least one day per week. Food was always available ad libitum. Rats maintained 80–90 % of their ad-lib body weight throughout the training and testing period. At surgery, rats weighed 280–350 g. All the experimental protocols used in the present study were in compliance with the NIH’s Guide for the Care and Use of Laboratory Animals (1996), and were approved and monitored by the Ethical Committee of Animal Experiments, Fudan University Institute of Neurobiology (Shanghai, China).

Apparatus

Rats were trained and tested using the DMGM task in a custom-made rectangular chamber (80 × 30 × 30 cm; LWH). The left panel of the chamber had a semicircular nosepoke hole (2.5-cm diameter) that was 15 cm above the floor and had an infrared nosepoke entry detector. The opposite end of the chamber was equipped with an infrared beam to detect the arrival of rats at the reward site and a solenoid valve to deliver the water reward (Fig. 1a). For the TRTO task, the semicircular arched nosepoke port (3-cm total width, 4.5-cm total height) was five times as large as the one in the DMGM task, and the lower edge of the port was on the floor of the chamber (Fig. 1b). This design in the TRTO task allowed the animal to trigger the infrared detector without standing up and to only invest time cost in maintaining the nosepoke stance. The apparatuses were controlled using in-house software written by Ji-Yun Peng.

Fig. 1
figure 1

“Do more get more” and time–reward trade-off tasks. (a and b) Behavioral apparatuses for the “do more get more” (DMGM) task (a) and the time–reward trade-off (TRTO) task (b). (c) The main task events for both tasks included nosepoke, nose withdrawal, reward delivery, and return. (d) Cost–benefit relationships between the nosepoke duration and the amounts of water reward delivered in the full-reward and half-reward sessions. The same relationships were applied in both the DMGM and TRTO tasks. (e) Timeline depicting the habituation, initial training, baseline training, and testing procedures used in the anterior cingulate cortex (ACC) inactivation experiments and the ACC dopamine blockade experiments

Procedure

Habituation and initial training

For the first three days, rats were individually habituated to the behavioral apparatus for 20 min per day. After habituation and water restriction, each rat was placed in the operant chamber for one 45-min initial training session per day. During the initial training, the rat triggered the start of every single trial by poking its nose into the nosepoke hole of the chamber. The animal needed to stand up to do this in the DMGM task, but not in the TRTO task. After the rat ended the nosepoke and ran to the reward site on the opposite end of the chamber, the solenoid valve would pump a fixed amount of water (0.1 ml) for each correctly performed trial—that is, each trial on which the subject held the nosepoke stance for at least 800 ms and arrived at the reward site within 5 s after the withdrawal of its nose. All subjects typically learned the link between the nosepoke response and the water reward (animals performed at least 150 trials within 45 min and gained a reward in at least 70 % of the trials) within one to three sessions. The 800-ms threshold was established to help the animal distinguish a sustained nosepoke response from a single nosepoke action.

Free choice

After rats had established the link between the nosepoke and the reward, the DMGM/TRTO rules were introduced into the DMGM/TRTO tasks. The rats were still required to maintain the nosepoke stance for at least 800 ms to obtain a water reward. Additionally, under the DMGM/TRTO rules, the volume of the water reward was directly proportional to the nosepoke duration; that is, a longer nosepoke duration resulted in a correspondingly larger volume of water reward (Fig. 1d). Occasionally, rats would perform a multiple-attempt nosepoke in one trial. In that case, only the duration of the last nosepoke duration was used to determine the reward in that trial. Typically, rats were able to meet the performance criteria of the DMGM/TRTO task in 10–20 training sessions. In the ACC inactivation experiments, a daily training or test included one full-reward session with a total of 14 ml of water reward. In the dopamine blockade experiments, a half-reward session was added, in which the same nosepoke duration would result in half the amount of water, relative to that in the full-reward condition. In other words, the reward-to-cost ratio was reduced by 50 % in the half-reward condition (Fig. 1d). A daily training or test included one full-reward session and one half-reward session performed in succession, which provided 8 ml of water reward each and were arranged in a counterbalanced manner.

Task variables

The success rate, or the percentage of reinforced trials, was defined as (number of trials with nosepoke duration over 800 ms ÷ total number of trials in a session) × 100 %. The single-attempt rate was defined as (number of trials with a single-attempt nosepoke ÷ total number of trials in a session) × 100 %. The locomotion time was defined as the time interval between withdrawal from the nose port and arrival at the reward site.

Performance criteria

Rats that met the following three criteria was considered to be subjects with stable performance: (1) >70 % success rate, (2) >70 % single-attempt rate, and (3) no significant differences in the main task variables (including success rate, single-attempt rate, average nosepoke duration, and average locomotion time) for at least three successive daily sessions.

Surgery

Rats that had been trained to a stable level of performance were anesthetized with sodium pentobarbital (40 mg/kg, i.p.). Two 22-gauge stainless-steel guide cannulas (0.7-mm outer diameter) were implanted into the ACC using standard stereotaxic techniques. The stereotaxic coordinates were as follows (Paxinos & Watson, 1998): 1.7 mm anterior to bregma, ±0.6 mm lateral from the midline, and 1.0 mm dorsoventral from the skull. Water deprivation was discontinued for three days for presurgery preparation and for one week for postsurgery recovery.

Drug treatment

Muscimol, SCH23390, and eticlopride (Sigma, MO, USA) were freshly dissolved in saline solution (0.9 %) before infusion. Muscimol was injected at a dose of 0.5 μg in 0.5 μl per site, and SCH23390 and eticlopride were injected at a dose of 1 μg in 0.5 μl per site. All injections were delivered through injection cannulae (0.38-mm outer diameter) over a 3-min interval. The injection cannulae were left in position for 1 min to allow for drug diffusion, and they protruded 1.5 mm beyond the guide cannulae. After the injection, the rat remained in its home cage for an additional 10 min before it was placed in the behavioral chamber for testing. All experiments in this study used a within-subjects design, with each subject in each experiment being tested with one drug infusion and one vehicle infusion in a counterbalanced order on the testing days. The drug and vehicle infusions were separated by 1 week of treatment-free training (Fig. 1e).

Histology

Rats were anesthetized with sodium pentobarbital (100 mg/kg, i.p.) and perfused transcardially with physiological saline and 10 % formal saline. The brains were removed and placed into sucrose solutions ranging from 10 to 20 to 30 % sucrose until they sank. The brains were sectioned with a freezing cryostat (Leica CM1900, Germany) at a 40-μm thickness. All sections were mounted and stained with neutral red. The infusion sites were examined under a light microscope (Olympus BX41, Japan) equipped with a CCD camera.

Data analysis and statistics

Data within groups were compared using paired Student’s t tests and a one-way or two-way repeated analysis of variance (ANOVA). The level of statistical significance (α level) was set at p < .05. The data in the text and figures are expressed as the means ± SEMs unless noted otherwise. To construct the cumulative probability distributions of the rewards, the earned rewards in all single trials by all subjects were pooled for each experimental condition. Statistical comparisons of the cumulative distributions were made using the nonparametric Kolmogorov–Smirnov (KS) test, and the significance level was defined as p < .001, due to the large data set used in the KS test. Additionally, the distributions were depicted with their 95 % confidence intervals to help further examine the significance between the different treatments. These statistical analyses were conducted in SigmaStat (Systat Software, Germany) and MATLAB (MathWorks, Natick, MA).

Results

Baseline training

Rats were trained in the DMGM or TRTO task in which the energetic or time costs expended in each sustained nosepoke would lead to directly proportional water reward. The subjects typically learned the DMGM rule within 15 days. This learning was marked by sharp decrease in the percentage of nonreinforced trials (nosepoke duration < 0.8 s) in the first 10 days of training, from 35.7 % ± 4.1 % on Day 1 to 12.1 % ± 1.9 % on Day 10, and reached 11.1 % ± 1.7 % by Day 15 (Fig. 2a). On the other hand, the peak of the distribution of nosepoke durations shifted from short responses (0.8–1.2 s) in the first five days to longer responses (1.6–2.0 s) after Day 10.

Fig. 2
figure 2

Training data for the DMGM and TRTO tasks. (a and b) Acquisition of a sustained nosepoke response during the DMGM (a) and the TRTO (b) training, as measured by the percentages of total nosepokes performed within each duration range and across four selected time points in the first 15 days. (c) Neutral red staining of a coronal section, indicating the placement of the cannula tips. (d and e) The schematics depict the locations of the injection cannulae tips (●) in the ACC for rats in the inactivation experiments performing the DMGM (d) and TRTO (e) tasks. Cg1, cingulate cortex area 1; Cg2, cingulate cortex area 2

In contrast, the rats learned the TRTO task within ten days, which was slightly faster than the time to learn the DMGM task. The TRTO training procedure also showed a dramatic decrease in the proportion of nonreinforced trials in the first ten days, from 24.4 % ± 2.2 % on Day 1 to 9.0 % ± 0.7 % on Day 10 (Fig. 2b). The maximal point of the distribution of nosepoke durations emerged as from 1.6 to 2.0 s after Day 5. On Day 15, the distribution of nosepoke durations formed a broad peak, with around 65 % of the nosepokes in the range of 1.2–2.4 s.

Inactivation of ACC impairs DMGM task performance

To determine the involvement of rat ACC in the DMGM decision-making paradigm, 12 well-trained rats were subjected to the inactivation protocol. Figure 2d represents the locations of the injection cannulae tips. To obtain a total of 14 ml of water reward, the same group of rats performed 76.8 ± 10.6 trials per session after saline infusion, and 209.8 ± 24.5 trials per session after muscimol treatment [t(11) = 5.05, p < .01]. Figure 3a shows the success rate—that is, the percentage of trials with a nosepoke duration over 800 ms. In the DMGM task, rats that had an infusion of saline sustained a nosepoke for longer than 800 ms and subsequently obtained a water reward in 87.7 % ± 3.1 % of the trials. In contrast, when the rats were treated with muscimol, the success rate was dramatically reduced, to 54.4 % ± 5.4 % [t(11) = 5.95, p < .01]. Excluding nonreinforced trials, the average nosepoke duration was also significantly decreased from 1,696 ± 139 ms after saline infusion, to 1,159 ± 35 ms after muscimol treatment (Fig. 3b; t(11) = 4.65, p < .01). However, intra-ACC infusion of muscimol had no significant impact on the single-attempt rate in reinforced trials (Fig. 3c; t(11) = 0.52, p = .62) or on the average locomotion time from nose withdrawal to the arrival at the reward site (Fig. 3d; t(11) = –1.78, p = .10). Figure 3e shows the cumulative probability distributions of water reward earned in each reinforced trial by all 12 rats. The distribution associated with muscimol treatment is significantly to the left of the distribution associated with the saline treatment (p < 10–40, KS test; n{muscimol, saline} = 1,212, 783 trials), which indicates that rats performed more low-cost–low-reward trials when treated with muscimol. These results reveal that ACC inactivation severely impaired DMGM task performance.

Fig. 3
figure 3

Inactivation of the ACC impairs performance of the DMGM task, but not of the TRTO task. (a and b) Effects of intra-ACC muscimol infusions on (a) the percentages of reinforced trials and (b) the average nosepoke durations in the DMGM and TRTO tasks. (c and d) Muscimol deactivation of the ACC had no impact on (c) the percentages of single-attempt trials or (d) the times from nose withdrawal to arriving at the reward site, in either the DMGM or the TRTO task. ** p < .01 versus saline, paired t test; n = 12 in the DMGM task, n = 11 in the TRTO task. (e and f) Cumulative probability distributions of rewards in the DMGM task (e) and the TRTO task (f)

Inactivation of ACC has no impact on TRTO task performance

To examine whether inactivation of the ACC alters responses in self-paced decision making on deferred gratification, 11 rats were trained and tested using the TRTO task. Figure 2e shows the locations of the injection cannulae tips. To gain a total of 14 ml of water reward in one session, the same group of rats performed 61.4 ± 6.4 trials per session after saline treatment and 67.9 ± 6.3 trials per session after muscimol treatment [t(10) = 1.63, p = .13]. As is shown in Fig. 3a–d, a bilateral intra-ACC injection of the GABAa-receptor agonist muscimol did not alter any behavioral variables of the rats in the TRTO task, when compared with their performance after they had received a saline injection. We observed no significant differences in success rates (Fig. 3a; t(10) = 0.10, p = .92), average nosepoke durations (Fig. 3b; t(10) = 1.73, p = .11), single-attempt rates (Fig. 3c; t(10) = –1.98, p = .08), or average locomotion times (Fig. 3d; t(10) = –1.04, p = .32) between the two treatments. As is shown in Fig. 3f, the cumulative probability distributions of earned water rewards for the muscimol and saline treatments overlap with each other (p = .0014, KS test; n{muscimol, saline} = 686, 623 trials). These results show that the muscimol-induced deactivation of the ACC did not affect TRTO task performance in rats.

Dopamine D1 receptors in the ACC are dispensable for DMGM decision making

To investigate the role of ACC D1 receptors in DMGM decision making, ten well-trained rats were used in a D1 blockade experiment. As is shown in Fig. 4a, the performance of rats with a bilateral microinjection of the D1 receptor antagonist SCH23390 showed no significant differences, as compared with their performance when they were treated with the same volume of physiological saline. A two-way repeated measures ANOVA was used to compare the performance of the rats in response to two different reward ratios over six different daily sessions. The results show a significant effect of reward ratio [F(1, 45) = 34.483, p < .001]. However, there was no significant daily session effect [F(5, 45) = 0.252, p = .936] or Reward Ratio × Daily Session interaction [F(5, 45) = 0.237, p = .944]. The data from the testing days were further subjected to paired Student’s t tests. The results showed that the average nosepoke duration in either full-reward sessions [t(9) = –0.32, p = .75] or half-reward sessions [t(9) = –0.02, p = .98] was not affected by blockade of D1 receptors in the ACC. On the testing days, the rats performed 36.5 ± 3.0 trials per full-reward session and 59.3 ± 9.6 trials per half-reward session after a saline treatment. After a SCH23390 treatment, the rats performed 37.3 ± 3.7 trials per full-reward session [t(9) = 0.21, p = .84] and 53.5 ± 6.28 [t(9) = –0.76, p = .47] per half-reward session. On testing days, the cumulative probability distributions of earned water rewards in full-reward sessions showed no significant difference between the SCH23390 and saline treatments (p = .1327, KS test; n{SCH23390, saline} = 347, 324 trials). For half-reward sessions, the distributions did show a statistically significance difference (p < 10–4, KS test; n{SCH23390, saline} = 503, 540 trials), but in examining the distribution curves, we found no practical deviation between the results from the SCH23390 and saline treatments (Fig. 4b). As is shown in Fig. 4c, D1 blockade had no impact on the average locomotion time in either the full-reward [t(9) = –1.68, p = .13] or half-reward [t(9) = –0.20, p = .85] sessions. Figure 4d shows the locations of the injection cannulae tips. These data suggest that D1 receptors in the ACC are not responsible for DMGM decision making.

Fig. 4
figure 4

Blockade of D1 receptors has no impact on DMGM decision making. (a) Mean nosepoke durations of rats after undergoing ACC D1 blockade and after a saline control injection in the pretesting and testing sessions for both the full-reward and half-reward tests. (b) Cumulative probability distributions of earned water rewards in the testing sessions. (c) Locomotion times in the testing sessions. (d) Locations of the injection cannulae tips (●) in the ACC. n = 10 rats

Dopamine D2 receptors in the ACC are required for DMGM decision making

Eight rats demonstrating consistent performances were used in a D2 blockade experiment. Figure 5a shows the average nosepoke durations over the three-day testing procedure. A two-way repeated measures ANOVA revealed a significant effect of reward ratio [F(1, 35) = 23.760, p = .002] and a significant effect of the daily sessions [F(5, 35) = 6.002, p < .001]. Furthermore, a statistically significant interaction emerged between reward ratio and the daily sessions [F(5, 35) = 3.143, p = .019]. Data from the testing days were compared using paired Student’s t tests. The results showed that when the rats received an intra-ACC infusion of eticlopride, they performed the same as when receiving a saline infusion in the full-reward sessions [t(7) = –1.37, p = .21], but showed a significant reduction relative to the effects of saline treatment on nosepoke durations in the half-reward sessions [t(7) = –3.08, p = .02]. On testing days, the rats performed 56.9 ± 4.6 trials per full-reward session and 77.0 ± 7.6 trials per half-reward session after saline treatment. After eticlopride treatment, the rats performed 69.50 ± 8.45 trials per full-reward session [t(7) = 2.15, p = .07] and 111.9 ± 17.0 trials per half-reward session [t(7) = 2.52, p = .04]. Figure 5b shows the cumulative distributions of earned water rewards in the full-reward and half-reward sessions on testing days. There was no significant difference between the eticlopride treatment and the saline control treatment in the full-reward sessions (p = .0690, KS test; n{eticlopride, saline} = 417, 394 trials). However, the subjects undergoing ACC D2 blockade conducted more low-cost–low-reward trials in the half-reward sessions, as denoted by a significant leftward shift of the cumulative probability distribution (p < 10–12, KS test; n{eticlopride, saline} = 686, 560 trials). As is shown in Fig. 5c, D2 blockade did not significantly affect the average locomotion times in either full-reward [t(7) = 2.32, p = .05] or half-reward [t(7) = 2.11, p = .07] sessions. Figure 5d depicts the injection sites within the ACC. These data show that blocking the D2 receptors in the ACC affected DMGM decision making when the amount of reward earned for the same amount of effort investment was reduced by half.

Fig. 5
figure 5

Blockade of D2 receptors affects DMGM decision making. (a) Mean nosepoke durations of rats after undergoing ACC D2 blockade and after a saline control injection in the pretesting and testing sessions for both the full-reward and half-reward tests. (b) Cumulative probability distributions of earned water rewards in the testing sessions. (c) Locomotion times in the testing sessions. (d) Locations of the injection cannulae tips (●) in the ACC. * p < .05 versus saline, n = 8 rats, paired t test

Discussion

ACC and cost–benefit decision making

The results of the inactivation experiments suggest that the ACC is critical for DMGM decision making, but not for time cost decision making. Temporary inactivation of the ACC severely impaired the performance of all rats in the DMGM task, as they performed the task with a dramatically lower success rate and also expended much less effort in the reinforced trials. The effects introduced by inactivation of the ACC cannot be attributed to changes in primary food motivation/appetite or motor/spatial impairments because there was no detectable impairment in motor control or motivation for the water reward, as is shown by the data that the percentage of one-attempt trials (reflecting the level of motor control) and the average time of running for the reward (reflecting locomotor capacity and motivation for the water reward) did not change significantly after inactivation of the ACC. The unaffected single-attempt rate and travel times from nose withdrawal to the reward site also indicate that the decisions were made by the subjects even with ACC inactivation, because they did not attempt to redo their decisions before collecting the rewards. Besides, other studies have also shown that ACC inactivation did not impair various instrumental operations, such as climbing barriers, pressing levers, or making nosepokes (Schweimer & Hauber, 2005; Schweimer, Saft, & Hauber, 2005; Walton, Bannerman, Alterescu, & Rushworth, 2003; Walton, Bannerman, & Rushworth, 2002; Walton et al., 2009). Therefore, the dramatic decrease in the effort expenditures of all subjects in DMGM task after ACC inactivation cannot be explained by impairment in the body/balance control that is required to maintain the nosepoke stance. In contrast, in the TRTO task, which was performed in a similar behavioral apparatus but required no behavioral effort for the subjects to maintain the nosepoke response, the inactivation of the ACC did not alter any behavioral variables, including success rate, average nosepoke duration, the proportion of single-attempt trials, and the average locomotion time. The results obtained from the TRTO task suggest that the ACC is not responsible for decision making on deferred gratification or for motor control, locomotor capacity, timing assessment, the discrimination of reward magnitude, impulsive tendencies or the maintenance of primary motivation in the process of unconstrained cost–benefit decision making. The distinct results from the ACC inactivation experiments differentiate the DMGM and TRTO tasks and indicate that different neural circuits are involved in these two types of decision-making tasks. These results are also in agreement with pervious findings that selective excitotoxic lesions of the ACC do not affect decisions on deferred gratification in rats, regardless of which type of task paradigm is used (Cardinal, Pennicott, Sugathapala, Robbins, & Everitt, 2001; Rudebeck, Walton, Smyth, Bannerman, & Rushworth, 2006).

In the last decade, a considerable body of research has demonstrated that the ACC and the dopamine input it receives play fundamental roles in effort-based decision making (Floresco & Ghods-Sharifi, 2007; Hauber & Sommer, 2009; Schweimer & Hauber, 2005, 2006; Schweimer, Saft, & Hauber, 2005; Walton, Bannerman, & Rushworth, 2002; Walton, Rudebeck, Bannerman, & Rushworth, 2007; Walton et al., 2009; and reviewed in Assadi, Yücel, & Pantelis, 2009; Floresco, St. Onge, Ghods-Sharifi, & Winstanley, 2008; Holroyd & Yeung, 2012; Kurniawan, Guitart-Masip, & Dolan, 2011; Rushworth & Behrens, 2008), though not in all forms of decision making involving the assessment of costs and benefits (Cardinal, Pennicott, Sugathapala, Robbins, & Everitt, 2001; Rudebeck, Walton, Smyth, Bannerman, & Rushworth, 2006; and reviewed in Walton, Rudebeck, Bannerman, & Rushworth, 2007). However, most of these studies employed tasks in which the subjects had to make an initial decision between two competing options and then complete the chosen response. The results from these tasks only reflect a preference between two options that are manipulated by the experimenters. On the other hand, the DMGM and TRTO tasks used in our present study assess cost–benefit decision making in a different manner: There is only one response operandum, so the subjects can assess the cost and benefit and then self-determine the effort/time cost they are willing to expend in each single trial. The major advantage of our tasks is that they allow one to measure the changes in energy/time investment in a more direct and precise fashion. In line with the well-demonstrated findings mentioned above, the results of our present study further suggest that the ACC is also a critical region involved in self-paced and effort-based decision making but not in decision making on time–reward trade-off.

ACC dopamine receptors and DMGM decision making

After determining that the ACC is necessary for DMGM decision making, we further investigated the roles of the ACC dopamine system in DMGM decision making using D1 and D2 antagonists and adding a half-reward condition under which the subject would get half of the reward it would have received under the full-reward condition for the same amount of effort. The results from the dopamine receptors blockade experiments reveal that the intra-ACC blockade of D2 receptors significantly reduced effort expenditure in the half-reward sessions. These findings suggest that DA input to the ACC acting on D2 receptors is involved in DMGM decision making. In contrast, intra-ACC infusion of the D1 receptor antagonist SCH23390 did not significantly influence the performance of the subjects in either full-reward or half-reward sessions. The relative distributions of D1 and D2 receptors in the prefrontal area are different (Gaspar, Bloch, & Moine, 1995; Goldman-Rakic, Lidow, Smiley, & Williams, 1992; Lidow, Goldman-Rakic, Gallager, & Rakic, 1991; Sesack, Snyder, & Lewis, 1995) and we do not assure that the doses of SCH23390 and eticlopride (1 μg, respectively) used here were fully equipotent in behavioral terms. The doses of SCH23390 and eticlopride were based on previous studies and on data reported in the literature (Floresco & Magyar, 2006; Ragozzino, 2002; Schweimer & Hauber, 2006; Seamans, Floresco, & Phillips, 1998; Sun & Rebec, 2005). Previous studies have demonstrated that the dose of SCH23390 used in present study, when microinjected into the ACC, altered rats’ response choices in the T-maze cost–benefit task (Schweimer & Hauber, 2006). Furthermore, we tested the effects of SCH23390 at a higher dose (2 μg per site) on five additional animals. We observed remarkable loss of motivation in all subjects. They were obviously hypoactive at this dose and failed to obtain the total reward in a session before giving up responding. This suggests that this dose of SCH23390 severely impaired the brain functions of the subjects and that testing with this dose has no physiological relevance. Thus, the failure of intra-ACC blockade of the D1 receptors to affect DMGM decision making cannot be attributed to inadequate drug dosing.

The dopamine system has been implicated in regulating effort-based decision making. Systemic administration of a D2 antagonist to rodents has been shown to lead to obvious shifts in choices toward the low-effort/low-reward options both in a cost–benefit T-maze task (Walton, Croxson, Rushworth, & Bannerman, 2005) and in a concurrent lever-pressing/chow-feeding task (Walton et al., 2009). Another research group using a T-maze effort-based decision-making procedure reported consistent results showing that rats were more likely to choose the small-effort/small-reward arm after the systemic injection of the D1 antagonist SCH23390 or the D2 antagonist haloperidol (Bardgett, Depenbrock, Downs, Points, & Green, 2009).

It has been proposed that the dopamine system modulates the activity pattern of prefrontal networks (Lapish, Kroener, Durstewitz, Lavin, & Seamans, 2007; Seamans, Gorelova, Durstewitz, & Yang, 2001; Trantham-Davidson, Neely, Lavin, & Seamans, 2004). These studies suggested that tonic and low concentrations of dopamine would induce a long-lasting activation predominantly in D1 receptors (D1 state), whereas a predominant activation of D2 receptors (D2 state) could be induced by phasic, high concentrations of dopamine (Seamans, Gorelova, Durstewitz, & Yang, 2001; Trantham-Davidson, Neely, Lavin, & Seamans, 2004). Functionally, it has been suggested that the D1 state leads to an increase in network inhibition so that only strong inputs can persist in the prefrontal network. On the other hand, the D2 state leads to a reduction of inhibition in the prefrontal network, which allows multiple types of information to be processed simultaneously. Furthermore, these two states fit well with the two main phases of decision making. Initially, the D2 state allows simultaneous representation of multiple types of information, which is essential for performing a cost–benefit analysis and outcome appraisal in the evaluation phase. Subsequently, the D1 state maintains the information about the selected response and shuts off other representations, which allows the animal to focus on the selected goal in the execution phase (reviewed in Assadi et al., 2009). According to this model, blockade of D2 receptors might impair the decision-making process in the DMGM task by disturbing the representation of large amount of information in the ACC network.

Moreover, the ACC has been proposed to monitor conflict as a function of task difficulty (Botvinick, 2007; Botvinick, Braver, Barch, Carter, & Cohen, 2001; Botvinick, Cohen, & Carter, 2004). In line with this notion, our results suggest that blocking of D2 receptors in the ACC and the subsequent biasing of responses toward putting forth less effort in the half-reward sessions can be attributed to the increase in task difficulty and conflict in the half-reward sessions, wherein the relative amount of effort required was increased as compared to the full-reward sessions: Rats had to invest more effort to gain an equivalent reward.