In many everyday situations, an organism must engage in behaviors when the outcomes of those behaviors are not certain. The uncertainty may be due to incomplete information (e.g., foraging in a new patch) or due to the outcome being probabilistic (e.g., betting on red in roulette). Unlike an ambiguous situation where an organism cannot know what outcome to expect, for probabilistic outcomes someone may well know the possible outcomes and their relative likelihoods but not know which outcome will actually occur in a given instance. Knight (1921) discriminated between these two classes of unknowns as uncertainty (unpredictable and immeasurable outcome likelihoods) and risk (predictable and measurable outcome likelihoods). A clear, and socially relevant, example of a situation involving risk is gambling – particularly games where skill and social competition cannot affect the likelihood of a win (e.g., slot machines). Experimental research has investigated a variety of factors involved in risky decision-making that aid in the development of a theoretical framework that may explain risky choice in humans (e.g., Mishra, 2014), nonhuman primates (e.g., Stevens, 2010), and other nonhuman animals (e.g., Kacelnik & Bateson, 1996). This research in risky choice has utility in experimentally investigating the social health problems involving risky decision-making – such as pathological gambling (Paglieri, Addessi, De Petrillo, Laviola, Mirolli, Parisi, et al., 2014). The purpose of the present paper is to demonstrate that preference for a risky-choice option in rhesus macaques is modulated by the inclusion of informative feedback that is situated in-between the choice and the outcome. This information cannot be used by the monkeys to modify their prior choice within a trial, and research in pigeons (Zentall & Stagner, 2011) has demonstrated that this information can lead to suboptimal choices. This paradigm may prove to be a good nonhuman animal model for gambling behavior in humans (Zentall, 2016a).

Experimental research on risky choice by nonhuman animals has a long history of investigating risk sensitivities (i.e., varying degrees of risk-prone or risk-averse choices). The procedures that are used generally involve two options to choose between, one that offers a certain (or a relatively more likely) outcome and another that offers a relatively larger (or better) outcome with a lower probability of receipt. Although it has been shown that humans (e.g., Kahneman & Tversky, 1979) and nonhuman animals (e.g., Kacelnik & Bateson, 1996) are generally risk averse, this is not a universal among all species and all contexts. Discovering the variety of conditions that modify risky choices can help identify the psychological processes that underlie risky decision-making (e.g., Heilbronner, Hayden, & Platt, 2010; Kacelnik & El Mouden, 2013; Sayers & Menzel, 2017). A variety of environmental contextual conditions have been shown to contribute to the propensity of subjects to choose a risky option over a relatively safe option (e.g., Heilbronner & Hayden, 2013). An obvious factor would be the likelihood that a risky choice will result in a favorable outcome (“win”). Xu and Kralik (2014) reported that rhesus monkeys generally displayed risk-prone choices in a two-choice task where the risky option and the safe option often delivered equivalent rates of food reward. Selecting the safe option would result in a single, but certain (p = 1.0), food reward and the risky option would result in an outcome that varied the relative quantities and probabilities of receiving each quantity across within-subject conditions (two items at 0.5 probability, four items at 0.25, eight items at 0.125). Monkeys favored the risky option in all those conditions, and only in a condition where the risky option had a lower expected value (offering two items at 0.125 probability) did the monkeys prefer the safe option. Thus, the monkeys generally displayed risk-prone choices, but this preference could be overcome if the risky option delivered a relatively poor payoff. Similarly, capuchin monkeys also demonstrated a bias towards a risky option over a safe option and demonstrated near indifference between a suboptimal risky option and a safe option in one condition (De Petrillo, Ventricelli, Ponsi, & Addessi, 2015).

Outside of the likelihood of a win, a variety of other contextual factors also modulate the propensity for human and nonhuman animals to favor the riskier of two available reward options (Heilbronner et al., 2010). For example, McCoy and Platt (2005) reported that macaques increased their preference for a risky choice when the magnitude of a win was increased, while the magnitude of the loss also increased to keep the expected values comparable, suggesting that the jackpot size has a disproportionate impact on risky decision-making. Additionally, Hayden and Platt (2007) demonstrated that introducing progressively longer inter-trial intervals, to space out the trial exposure, systematically resulted in reduced preference for the risky option. Thus, “repeated gambles” that allowed for a fast rate of gambling opportunities increased the likelihood of gambling relative to a lower rate of gambling opportunities. Although Kahneman and Tversky (1979) suggested humans are generally risk-averse when working for gains, when risky choices are presented as quick repeated trials, humans also appear more risk-prone (Silberberg, Murray, Christensen, & Asano, 1988). Thus, although humans and nonhuman animals may sometimes appear to differ in risky decision-making, especially when humans are assessed using hypothetical situations rather than direct experience (Hertwig, Barron, Weber, & Erev, 2004), they tend to make similar choices when humans are tested under conditions similar to nonhuman research subjects (Hayden & Platt, 2009; Lagorio & Hackenberg, 2010).

An additional environmental factor, which is the focus of the present experiment, is the inclusion of a stimulus predictive of the risky-choice outcome that bridges the time between selection of the risky option and the subsequent outcome. This “signal” has been shown to increase preference for the risky option in pigeons (e.g., Zentall & Stagner, 2011), starlings (e.g., Vasconcelos, Monteiro, & Kacelnik, 2015), rats (e.g., Chow, Smith, Wilson, Zentall, & Beckmann, 2016; although the effect was not observed in earlier studies with rats where the delay signal did not elicit sign-tracking; Trujano, & Orduña, 2015), and 3-year-old children with mild developmental delays (Lalli, Mauro, & Mace, 2000). In instances where the risky choice results in a delayed and probabilistic outcome there exists the opportunity for the organism to wait in anticipation of the outcome. Under ordinary conditions there would not be a strong expectation of an upcoming reward during a delay; however, if the delay is bridged by a stimulus signaling the upcoming outcome, then subjects have the opportunity to expect an impending reward following a win. Under these conditions subjects are more likely to prefer a risky option over a safe option, and this observation has been termed the “signaling effect” (McDevitt, Dunn, Spetch, & Ludvig, 2016; Zentall, 2016b). Zentall and Stagner (2011) arranged a procedure where pigeons chose between a risky option (offering ten pellets at 0.2 probability or no pellets at 0.8 probability, expected value = two pellets) or a safe option (offering three pellets at 1.0 probability, expected value = three pellets). The outcomes following a choice were delayed 10 s and were either signaled (Experiment 1, win or loss cues were presented and correlated with the ten or zero pellet outcome, respectively) or functionally unsignaled (Experiment 2, stimuli cues were presented but not correlated with the outcome). Zentall and Stagner reported that pigeons “irrationally” favored the risky option, at the expense of reward maximization, when the outcomes were signaled and “rationally” favored the safe option when the outcomes were not signaled. Other versions of this effect have been observed, dating back to Kendall (1974), and have been shown to be modulated by a variety of environmental contexts (Dunn & Spetch, 1990; Laude, Pattison, & Zentall, 2012; McDevitt, Spetch, & Dunn, 1997; Pattison, Laude, & Zentall, 2013).

This suboptimal performance is a paradox partially because it results in suboptimal outcomes that appear to contradict the optimality assumption in foraging theory (Stephens & Krebs, 1986), although this apparent contradiction can be addressed if you consider that animal decision-making mechanisms evolved in natural settings where information signaling a losing circumstance could be used to modify behavior adaptively (e.g., a predator could break chase of a prey that was certain to escape; Vasconcelos, Monteiro, & Kacelnik, 2015). Also, this performance conflicts with the classic win-stay lose-shift response pattern often found in decision-making paradigms (Harlow, 1949). The search for a mechanism that may explain this result has ruled out the most likely confound that the suboptimal option was functionally optimal if pigeons pecked the risky option more during the unsignaled period resulting in a higher unit price (i.e., pecks per pellet) for that option (Hinnenkamp, Shahan, & Madden, 2017).

Accumulating evidence suggests that a key factor involved in suboptimal choice is that the loss-signal does not discourage risky choices to the same degree that the win-signal encourages risky choices. This includes evidence that delaying the win signals fails to encourage risky choices (McDevitt et al., 1997; Vasconcelos, et al., 2015) and evidence that selectively enhancing the magnitude of a signaled risky-loss does not disrupt the control that the signaled-win maintains over encouraging risky choices (Fortes, Vasconcelos, & Machado, 2016). Laude, Stagner, and Zentall (2014) explored this issue by replicating the Zentall and Stagner (2011) procedure with pigeons choosing between a three-pellet safe option and a ten-pellet (at 0.2 probability) risky option. However, in this procedure they ran probe tests where the loss-signal was superimposed over the win-signal and they found that early in training the superimposed loss-signal inhibited responding, but over time the inhibition of the loss-signal decreased in efficacy. Thus, the loss signal eventually lost its aversive properties while the win-signal continued to maintain responding.

McDevitt et al. (2016) advocated for a Signals for Good News (SiGN) hypothesis that attempts to explain the suboptimal performances in the Zentall and Stagner (2011) task. The mechanisms involved in signal-guided behavior include the informative aspects of the signal to guide adaptive decision-making, which often appears “rational,” and the attentional or motivational aspects of the win-signal that disproportionately encourages the seeking out of “good news” even when seeking good news is not always adaptive to the situation – a situation which often appears “irrational.” Rats (Nevin & Mandell, 1978), pigeons (Dinsmoor, Browne, & Lawrence, 1972), and humans (Fantino & Silberberg, 2010) have all demonstrated a preference bias towards informative stimuli that are correlated with reward (i.e., seeking “good news”) over equivalently informative stimuli that are correlated with non-reward (i.e., seeking “bad news”). However, pigeons will seek out information that provides bad news as long as the same response occasionally also produces good news (Dinsmoor et al., 1972). Fantino and Silberberg have reported that humans may utilize the response that produces bad news if the feedback occasionally does not produce a stimulus and that absence of a stimulus could be used to infer good news. Rats also have responded to produce a bad-news stimulus under specific conditions where the bad news was functionally “not yet, but soon” feedback rather than “not now” feedback (Escobar & Bruner, 2009).

The present experiment utilizes the general procedure used in Zentall and Stagner (2011) to assess the effects of outcome signals on risky choices in rhesus macaques (Macaca mulatta). If the mechanism underlying the signaling effect is a general mechanism expressed across animal species, then it was predicted that the monkeys would show increased risky-choices under signaled outcome conditions (Prediction 1). To further evaluate whether signal-induced increases in risky choices were due to both sensitivity to signaled-wins and insensitivity to signaled-losses, the likelihood of a risky choice given a prior outcome was evaluated and it was predicted that, contrary to the win-stay lose-shift strategy, monkeys would continue to choose the risky option following a signaled-loss, but not following an unsignaled-loss (Prediction 2). Consistent with prior research showing that macaque monkeys’ risky choices are sensitive to the expected value of the risky option (e.g., Xu & Kralik, 2014), it was predicted that the monkeys’ performances would be sensitive to the local (i.e., 36 trials) expected value of reward from the risky option (Prediction 3). Finally, if a common (dominant) mechanism can explain the monkeys’ choices under these conditions, it was predicted that individual monkeys would all show similar response patterns in all of the analyses (Prediction 4).

Method

Subjects

Seven adult male rhesus monkeys (Macaca mulatta) participated in this experiment. Monkeys were not food-deprived or weight-reduced, and they had free access to water at all times. The monkeys were individually housed, but had visual access to other monkeys at all times. Monkeys had extensive prior experience operating the apparatus that utilized a joystick to move an onscreen cursor to earn pellets according to the programmed contingencies (Richardson, Washburn, Hopkins, Savage-Rumbaugh, & Rumbaugh, 1990). Monkeys started working at 0900 h for 4–5 days a week, except for Murph who worked 7 days a week. Sessions lasted until 120 trials were completed (often requiring approximately 1 h) or until a total of 5 h passed. About 60% of sessions had all 120 trials completed in Phase 1 and 73% of sessions were completed in Phase 2. This study complied with approved Georgia State University IACUC protocols and the USDA Animal Welfare Act, and the “Guidelines for the Use of Laboratory Animals.” Georgia State University is an AAALAC-accredited institution.

Apparatus

The monkeys operated the Language Research Center’s Computerized Test System (LRC-CTS; Richardson et al., 1990) to participate in this experiment. This system consisted of a personal computer with color monitor, digital joystick, and food pellet dispenser. The monkeys manipulated a joystick outside of the cage by reaching through the cage mesh. The joystick controlled a cursor on a computer monitor, and the computer was programmed to deliver 94-mg banana flavored pellets (Bioserve, Frenchtown, NJ, USA), as a consequence for making choices, through a dispenser interfaced to the computer using a relay box and output board (Keithley Instruments, Cleveland, OH, USA).

Procedure

Trials consisted of an inter-trial interval (ITI), a choice period, a delay-signal period, and an outcome period where food (or no-food) was delivered and a new ITI commenced (Fig. 1). The ITI (lasting 8 s in Phase 1 and 6 s in Phase 2) consisted of a blank screen with a white background and no programmed output resulted from manipulating the joystick. Following the ITI, two distinct colored clipart icons (representing the choice options) appeared on the upper-left and upper-right corners of the screen (their left-right assignments were randomly determined between trials) along with a red curser located in the lower center part of the screen. Use of the joystick manipulated the curser, and moving the curser to contact one of the two clipart icons registered a choice selection. The selection of either option would result in a delay-signal period (lasting 12 s in Phase 1 and 9 s in Phase 2) where the curser and the unselected clipart icon would disappear and the background would flash every 0.25 s between white and another color (red, blue, green, yellow, black, or gray depending upon the signaling condition and experimental phase). The designated safe option would always result in two pellet deliveries following the delay-signal period. The risky alternative option would either result in zero pellets (i.e., a loss) with a 0.8 probability or eight pellets (i.e., a win) with a 0.2 probability. Following the delay-signal period, if appropriate, food pellets would be delivered (0.5 s per pellet) in the outcome period and the nonwhite background color corresponding to the flash would remain constant during the pellet deliveries. The ITI commenced following the delivery of the final pellet or immediately following the delay period if no pellets were forthcoming.

Fig. 1
figure 1

Illustration of the time through the trial. (A) Blank ITI period. (B) Choice period with clipart icons and cursor. (C) Delay-signal period (assuming left icon was chosen) oscillated between white and alternative color background every 0.25 s. (D) Outcome period with the nonwhite background color and the chosen icon shown. Pellets, if scheduled, were delivered

The signaling conditions determined whether the flashing cue during the signaled-delay was predictive of the outcome. In the signaled condition, choosing the risky option would result in a red flash when eight pellets were scheduled to be delivered and a yellow flash when zero pellets were scheduled to be delivered. Thus, red flashing was an immediate cue predictive of a win and a yellow flash was an immediate cue predictive of a loss. Choosing the safe option would result in a blue flash with a 0.2 probability and a green flash with a 0.8 probability, both resulting in a two-pellet delivery. Thus, blue and green flashes did not differentially signal the outcome, but were included to keep the variability in delay-signal consistent with the risky option. In the unsignaled condition, choosing the risky option or the safe option resulted in the same corresponding reward contingencies as in the signaled condition; however, the flash color during the signaled-delay period never differentially signaled the outcome from the risky option. The unsignaled-delay flash color differed between phases (always black flash in Phase 1, and black flash following a safe choice and a gray flash following a risky choice in Phase 2).

New clipart icons represented the safe and risky choice options at the start of each session. Thus, monkeys’ icon preferences would purely reflect experience with the present contingencies rather than prior sessions’ contingencies. Although the stimuli the monkeys selected during a session were randomized between sessions, the signaled-delay stimuli were not counterbalanced between subjects. Exposure to signaled and the unsignaled delay-signal conditions was randomly determined between sessions. Forced choice trials occurred in four-trial blocks (two safe trials, one risky-win trial, and one risky-loss trial, randomly presented) located at session thirds (trials 1–4, 41–44, and 81–84). Thus, sessions were segmented into three within-session components composed of 40 trials (four forced and 36 choice trials). To ensure that the monkeys were attending to the forced choice icons, forced choice trials presented both icons, but only one icon would successfully register a selection while contacting the alternative would immediately reset the trial by returning the curser back to the start of trial position.

The experiment progressed through a pilot phase and two experimental phases. The pilot phase was discontinued due to the absence of an effect in three of the four monkeys piloted and the procedure and data are included in supplementary materials. Following the pilot phase, Phase 1 continued for at least 20 sessions and involved an 8-s ITI and 12-s delay-signal period. In the unsignaled sessions the screen would flash between black and white regardless of the option chosen (risky or safe) and regardless of the risky choice outcome (win or loss). Phase 2 continued for at least 45 sessions for each monkey and reduced the ITI duration to 6 s and the delay-signal period to 9 s; this was done to increase session completion rates. Also, in the unsignaled condition of Phase 2, the delay-signal flash-color was different depending upon whether the risky or safe option was selected. If the safe option was selected, then a black-white flash was presented, and if the risky option was selected, then a gray-white flash was presented. This allowed the monkey to better dissociate between risky and safe choices, but did not differentially signal between risky wins and risky losses; this change was included to improve discrimination between risky and safe trial outcomes. Table 1 lists the number of sessions in the signaled and unsignaled conditions for each monkey in each phase.

Table 1 Number of signaled and unsignaled sessions for each monkey and phase

Data analysis

A multilevel modeling approach was applied to all the data using the R package lme4 (Bates et al., 2016). Except where noted, data from sessions that ended before reaching the 120-trial criterion were included in analyses and forced choice trials were excluded from all analyses. For all of the following analyses the continuous variables were centered and the categorical variables were effect-coded to address issues of multicollinearity with the interaction terms. For all models the random effects of intercept and slope factors were allowed to vary at the individual subject level. The primary dependent measure of interest was the choice between the risky option (1) or the safe option (0) for each (non-forced) trial within a session. A generalized mixed effects model specified a binomial error distribution for the choice variable, and the fixed effects predicted the probability of making a risky choice as a function of the condition factor (i.e., signaled vs. unsignaled), successive trials within a session, and their interaction. The trial variable was transformed using a natural logarithm. This model is functionally equivalent to a repeated measures logistic regression, but has the advantage of handling unbalanced data and can simultaneously provide model estimates at both the group level (to show the generality of the effects) and the subject level (to show individual differences between subjects) (Gelman & Hill, 2006). This model was tested against a null model (a model identical to the above, except it excluded the condition factor) by a likelihood ratio test and likelihood ratios were reported to assess the size of the effects (Wagenmakers & Farrell, 2004). To determine whether a prior trial’s outcome contributed to the likelihood of making a risky choice another model predicted the probability of risky choice as a function of the previous trial’s choice and outcome (risky-win, risky-loss, safe outcome), the condition factor, and their interaction.

The average amount of food obtained within a session between the signaled condition and unsignaled condition were assessed in order to determine whether differential amounts of pellet reward between the two conditions could explain differential risky-choice preferences between those two conditions. First, the proportion of risky choice (risky choices/(risky choices + safe choices)) was aggregated between the within-session components (i.e., the block of four forced-choice and 36 free-choice trials) and analyzed as a function of the proportion of the obtained risky-win outcomes (risky wins/(risky wins + risky losses)). This included both of the signaled-condition sessions and unsignaled-condition sessions. The outcome factor was normally distributed and a linear mixed effects model was used to describe the data. However, the model that included the interaction term (signal-condition by proportion of risky-win) did not perform better than a comparison model that excluded the interaction term (determined by a likelihood ratio test) for Phase 1 (χ2(5) = 2.8485, p = 0.7233) and Phase 2 (χ2(5) = 1.8146, p = 0.8742). As a result, the simpler model without the interaction term was used. Second, to determine the effect of the condition factor on total food obtained in a session, a generalized linear mixed effects model predicted total pellets earned in a session as a function of the condition factor. Only sessions where all 120 trials were completed were included in this analysis because otherwise the model tracked differences between trials completed between the two conditions rather than the differences in the amount of food obtained between the two conditions. A Poisson error distribution was specified for the food outcome variable because earned pellets are measured in discrete units.

Results

The primary analysis, investigating the effect of within-session trial exposure and signaling conditions on risky choice (Prediction 1), demonstrated an overall effect with a trial-by-condition interaction in Phase 1 (z = 2.74, p = 0.006) and Phase 2 (z = 3.93, p < 0.001). The likelihood ratio test showed that the full model was over 1,000 times more likely than the null model (the same model, but excluding the signal-condition factor) for data in Phase 1 (χ2(9) = 672.26, p < 0.001) and Phase 2 (χ2(9) = 2899.2, p < 0.001). Figure 2 shows the within-subject risky-choice model fits for Phases 1 and 2. At the group level, for both Phases, risk preference in the beginning of the session was nearly equal for the signaled and unsignaled conditions (~0.50 risk preference in Phase 1, ~0.43 risk preference in Phase 2; both Phases started out with 0.5 risky choices within the 95% CI) and diverged with prolonged exposure within a session. The probability of risky choice increased across trials in the signaled condition ending near 0.60 (Phase 1) and 0.64 (Phase 2); however, 0.5 risky choice remained within the 0.95% CI in both phases. The probability of risky choice decreased across trials in the unsignaled condition ending near 0.41 (Phase 1) and 0.25 (Phase 2), and 0.5 risky choice remained outside of the 95% CI in both phases. Note that lack of difference in risky choice between the two signal conditions at the start of a session in both phases was anticipated because the monkeys did not know the choice-contingency association at the start of a new session when new choice icons were introduced. At the individual subject level, apart from Chewie in Phase 1 and Hank in both phases, risky choices were higher in the signaled condition compared to the unsignaled condition for all monkeys, although the shape of the function and the magnitude of that preference varied between subjects (Prediction 4).

Fig. 2
figure 2

Overall (fixed effect, top panels) and individual (random effect, bottom panels) model fits for the probability of risky choice (±s.e.) as a function successive trials within a session and condition for Phase 1 (left panels) and Phase 2 (right panels)

The probability of making a risky selection was also a function of the prior choice and outcome (Prediction 2). The model that included the signal-condition factor was over 1,000 times more likely than the null model that excluded that factor for Phase 1 (χ2(22) = 4364.9, p < 0.001) and Phase 2 (χ2(22) = 5519.2, p < 0.001). Figure 3 shows the model fit for the probability of risky choice as a function of the signaling condition and the prior trial’s outcome. Tukey’s HSD post hoc test was used to evaluate the pairwise comparisons between the various levels of the signaling factor (signaled, unsignaled) and the prior outcome factor (safe, loss, win). The following pairwise differences were statistically significant between prior trial outcomes: In Phase 1 there was a difference between the signaled-safe and signaled-loss (z = 3.02, p = 0.03), between signaled-safe and signaled-win (z = −4.63, p < 0.001), between signaled-loss and signaled-win (z = −2.94, p = 0.04), and between unsignaled-safe and unsignaled-win (z = −3.35, p = 0.01, Phase 2). In Phase 2 there was a difference between signaled-safe and signaled-loss (z = 3.57, p = 0.01) between signaled-safe and signaled-win (z = −4.58, p < 0.001), between unsignaled-safe and unsignaled-win (z = −5.27, p < 0.001), and between an unsignaled-loss and unsignaled-safe (z = 4.41, p < 0.001). The following pairwise differences were statistically significant between signaling conditions: In Phase 2 there was a statistically significant decrease in risky choice between signaled-safe and unsignaled-safe (z = 5.35, p < 0.001) and signaled-win and unsignaled-win (z = 2.75, p = 0.06). Most of the individual monkey model fits resembled the overall group model, where reductions in risky choices in the unsignaled condition was mostly driven by prior safe outcomes, but there were some notable deviations (Prediction 4). Monkey Murph’s risky choices between signaled conditions appears to be driven by changes in prior risky-wins and risky-losses, and Chewie does not appear to show any sensitivity to prior outcomes, but rather shows general decreases in risky choice in the unsignaled condition.

Fig. 3
figure 3

Overall (fixed effect, top panels) and individual (random effect, bottom panels) model fits for the probability of risky choice (±s.e.) as a function of the prior outcome and condition for Phase 1 (left panels) and Phase 2 (right panels)

To determine whether monkeys’ preference for the risky option was influenced by the expected value of their recently experienced risky-outcomes (Prediction 3), a model predicted the proportion of risky choice as a function of the obtained proportion of risky wins and the signaling condition. Monkeys were generally more likely to favor the risky option in components that resulted in higher rates of risky-wins. This increase in risk preference was consistently higher in the signaled condition compared to the unsignaled condition. The model that included the condition factor was over 1,000 times more likely to account for the data than the null model excluding that factor in Phase 1 (χ2(4) = 32.681, p < 0.001) and Phase 2 (χ2(4) = 147.9, p < 0.001). Figure 4 shows the linear relationship of the proportion of risky choice as a function of the proportion of risky-wins for both signaling conditions. In Phase 1, for each 5% increase in the experienced proportion of risky-wins there was approximately a 3.7% increase in preference for the risky option regardless of the signaling condition. In Phase 1, the probability of making a risky choice in the signaled condition was 4% greater in the signaled condition compared to the unsignaled condition. In Phase 2, for each 5% increase in the experienced proportion of risky-wins there was approximately a 2.4% increase in preference for the risky option, and the probability of making a risky choice was approximately 14% greater in the signaled condition compared to the unsignaled condition.

Fig. 4
figure 4

Overall (fixed effect, top panels) and individual (random effect, bottom panels) model fits for the proportion of risky choice within a component (±s.e.) as a function of the proportion of risky wins within a component and condition for Phase 1 (left panels) and Phase 2 (right panels)

To investigate whether the greater preference for the risky option in the signaled condition was due to differences in the number of pellets obtained in those conditions, the total number of pellets earned was predicted as a function of signaling-condition. In Phase 1, there was no relationship between the signaling condition and the average food earned within a session (z = −0.7, p = 0.45; on average, 232 pellets were obtained in signaled condition and 235 pellets were obtained in unsignaled condition). The model that assumed differences in total earned food at the end of the session did not perform better than the null model that assumed no difference (χ2(3) = 2.87, p = 0.41). In Phase 2, there were more pellets obtained in the unsignaled condition (238 pellets) than in the signaled condition (227 pellets) (z = −2.9, p = 0.004), and the model including the condition factor was over 1,000 times more likely than the null model that excluded the condition factor (χ2(3) = 54.82, p < 0.001). Phase 2 did show a reduction in food earned for monkeys in the signaled condition, which is expected since the signaled condition encourages risky choices, and increased preference for the risky option should translate to less food earned, but the difference was only 11 fewer pellets on average. Thus, between the absence of an effect for Phase 1 and only a small (4.6%) difference in Phase 2, the effect of condition on food earned was not considerable. Regardless, preference for the risky option in the signaled condition was not driven by an average increase in pellet reward in that condition.

Discussion

The present experiment demonstrated that macaques’ risky decision-making was affected by the inclusion of information about the outcome of the prior risky choice. Overall, monkeys were more likely to favor the risky option under the signaled condition compared to the unsignaled condition. This is consistent with research found frequently with pigeons (e.g., Zentall & Stagner, 2011), rats (e.g., Chow et al., 2016), and humans (e.g., Lalli et al., 2000), and this finding supports the first prediction. This result in macaques is similar to research showing that monkeys prefer options that provide information (signaled) over options that do not provide information (unsignaled) about an upcoming outcome. Bromberg-Martin and Hikosaka (2009), for example, offered water-deprived macaques two options that delivered either 1.0 ml or 0.04 ml of water randomly with equal probability, but water delivery was delayed approximately 2 s with a cue in the delay that was either correlated with the outcome or unrelated to the outcome. Both monkeys favored the informative option. Using a similar procedure, Blanchard, Hayden, and Bromberg-Martin (2015) demonstrated that macaques would favor the informative option over the uninformative option at the expense of maximizing the rate of water reward delivery. Unlike Bromberg-Martin et al. and Blanchard et al., the present experiment did not assess preference for information, but rather assessed the influence of information (or its absence) on preferences for a risky option. Regardless, it is curious that signaling in this situation leads to suboptimal performances when signals often aid optimal decision-making. For example, intertemporal choice experiments with macaques have shown that including a signal in the post-reward buffer delay (that equates the total trial duration between a smaller-sooner trial and larger-later trial) results in an increase in preference for the “optimal” larger-later reward option (Blanchard, Pearson, & Hayden, 2013; Pearson, Hayden, & Platt, 2010). This signal provides information that allows the monkeys to dissociate choice-outcome contingency from the post-choice delays to improve their choices that would otherwise appear to be “impulsive.”

The similarity between the experimental findings from the present study, Bromberg-Martin and Hikosaka (2009), and Blanchard et al. (2015), may reflect a common mechanism where signaled-losses fail to affect performances in the same manner that signaled-wins do – a finding in agreement with the SiGN hypothesis (McDevitt et al., 2016). In the present experiment, and the experiments by Bromberg-Martin et al. and Blanchard et al., the risky option occasionally provided good news. However, if the signaled-win functioned as good news that encouraged responding for the risky option, then it is reasonable that a signaled-loss would function as bad news that would discourage responding on the risky option and cancel out the effects of the good news, or worse have an overall impact of discouraging responding on the risky option because a signaled-loss was encountered more frequently. However, the loss-signal appears to play no significant role in affecting choices (Fortes et al., 2016; Laude et al., 2014; Vasconcelos et al., 2015). In the present study the signaled-loss outcome was more likely to be followed with continued selections of the risky option rather than with a switch to the safe option, demonstrating that the monkeys were insensitive to the signaled losses. However, this pattern cannot be attributed to the signaling factor, because the monkeys also showed a tendency to return to the risky option following a risky-loss in the unsignaled condition. It appears that the effect of the signaling condition was to produce a general decrease in the likelihood of choosing the risky option, regardless of the prior choice and outcome. Thus, while it appears that the monkeys were insensitive to the signaled-loss (consistent with the SiGN hypotheses and Prediction 2) in the present experiment, that insensitivity does not appear to explain why monkeys are more likely to choose the risky option in the signaled condition. The lag analysis has confirmed that it was the selection of an option that often predicted a return to that option, and this is a deviation from the win-stay lose-shift response pattern. Thus, Prediction 2 was partially supported by evidence that the monkeys were insensitive to the loss-signal, but was not supported by the evidence that unsignaled losses also resulted in continued risky choices.

Monkeys’ risky choices were sensitive to the local expected value of reward obtained from the risky choice. Though the probability of a risky-win was set at 0.2, there was natural variance in the obtained probability of a risky win across the 36 (free choice) trial components. The rate of selecting the risky option would not affect the programmed rates of winning and as a result this natural variance in win rate allowed for an assessment of sensitivity to the rate of risky-wins on risky choice. By Phase 2, all monkeys showed an increase in risky choice in components with higher proportion of risky-wins, and the sensitivity to the rate of risky-wins was equivalent between the signaled and unsignaled conditions. This sensitivity to the expected value of the risky choice is consistent with prior research on risk sensitivity in macaques (Xu & Kralik, 2014) and supported Prediction 3. Equivalent sensitivity to the expected value of the risk choice occurred under both signaled and unsignaled conditions, but the proportion of risky choices was higher in the signaled condition.

There was an observed minor (4.6%) increase in food obtained in the unsignaled condition over the signaled condition in Phase 2. This reinforces the point that the mechanism influencing risky choice through the signaling effect is not dependent upon the objective rate of reward. Furthermore, the absence of a statistically significant interaction between the signaling condition factor and the expected value factor on risky choices further supports this point (refer to the parallel model fits between the signaling conditions observed in Fig. 4). If the signaling effect functioned by selectively increasing sensitivity to the expected value of the risky option in the signaling condition, then it would be predicted that the slope of the signaled-condition model fit would be steeper than the slope of the unsignaled-condition model fit.

The advantage of the multi-level modeling approach used here is that it allowed for a comprehensive model to capture both the generality of the effect of the signaling condition in the fixed effects and to capture the differential sensitivity to the signaling factor between individual monkeys in the random effects. The present results show that while the signaling effect was consistent in six of the seven monkeys, there were individual differences in the expression of the sensitivity to the signaling condition that was hidden in the global fixed effects model. For example, the global model evaluating the effect of the signaling condition across session exposure (Fig. 2) predicted that the monkeys would show increases in risky choice in the signaled condition and decreases in risky choice in the unsignaled condition. However, monkey Lou tended to generally show increases in risky choice across a session in both signaling conditions, but the increase was greater in the signaled condition. Alternatively, monkeys Murph and Chewie tended to decrease risky choices across the session, but the decrease was not as great in the signaled condition. Thus, the signaled condition generally resulted in more risky choices relative to the unsignaled condition, but it was expressed differently across monkeys. Individual differences were also observed in the influence to prior outcomes on risky choices. Chewie’s risky choices, for example, appeared to show little to no sensitivity to the prior outcome factor (Fig. 3), but he still showed a general increase in risky choices in the signaled condition, suggesting that a specific prior outcome does not drive the signaling effect. Murph’s increased preference for the risky option in the signaled condition appears to be driven by prior risky-win and risky-loss outcomes, while the majority of monkeys see the biggest increases in risky choices following a post-safe outcome. These individual differences in the sensitivity to the prior outcome further suggests that a uniform mechanism explaining the signaling effect will not be found in data evaluating the immediate prior choice and outcome. Thus, the present data do not appear to provide strong support for Prediction 4.

Risky choice procedures are frequently described as offering the opportunity to “gamble” (Paglieri et al., 2014); however, the outcome signals that result in irrational decision-making may capture an important environmental determinant involved in the real-world gambling problems that humans experience. This connection between the signaling effect and gambling behavior has face validity where the signaled delay represents the anticipation phase that bridges the wager and the outcome – the procedure resembles many gambling events where information between the wager and the outcome can generate anticipation for a potential win. Beyond appearances, however, there are reasons to believe that the signaling effect captures a psychologically meaningful aspect embedded in the traditional gambling environment. First, the neurological basis for gambling disorders may be based in reward anticipation (e.g., Linnet, 2014) and the signaled-delay procedure accentuates that aspect of the risky choice contingency. Second, this task has shown promise as a means to predict individuals with a gambling habit. Molet, Miller, Laude, Kirk, Manning, and Zentall (2012) arranged a task for college undergraduates that closely paralleled a task that they used to investigate the signaling effect in pigeons (Zentall & Stagner, 2011). The undergraduates also were asked about their real-world gambling habits, and it was reported that those students with a “gambling habit” were more likely to choose the signaled risky option in this task. However, they did not assess whether students with a gambling habit were also more likely to choose the risky option in an unsignaled condition, thus the contribution of the signal has not yet been isolated. Furthermore, the argument that the signaled-reward procedure captures an important aspect involved in potentially problematic gambling behavior is supported by research that shows that gambling disorders could be due to dysfunctional reward anticipation where dopaminergic neurons in humans showed increased activity (measured by a PET scan) under a procedure with a 0.5 probability win in an Iowa Gambling Task (Linnet, Mouridsen, Peterson, Møller, Doudet, & Gjedde, 2012). The Bromberg-Martin and Hikosaka (2009) information-seeking task with macaques also included neural recording of dopaminergic neurons in the substantia nigra and ventral tegmental area. Those neurons were responsive to both water delivery and also the signals predictive of upcoming water delivery, and Bromberg-Martin and Hikosaka argued that the reward-signal appeared to have functioned as a reward in and of itself.

To summarize, the present signaled-outcome procedure demonstrates another contextual factor that modulates risky choice in nonhuman primates. The effect of signaling wins may be related to a mechanism that allows the anticipation of the impending reward to provide its own rewarding properties, and this may tie into gambling behavior in humans. Further research may utilize the signaled-outcome procedure to see how it relates to other psychological tendencies between subjects. For example, Laude, Beckmann, Daniels, and Zentall (2014) demonstrated that pigeons that were less delay tolerant (i.e., more likely to choose a smaller-immediate reward over a larger-delayed reward) were also more likely to favor a suboptimal risky option over pigeons that showed greater tolerance to delay. This suggests that risky choice and delay intolerance (sometimes characterized as impulsive choice) are linked traits, a suggestion further supported by research showing that humans with a problem gambling habit are increasingly more likely to discount hypothetical delayed monetary rewards as a function of the severity of problem gambling (Alessi & Petry, 2003). Other avenues for exploring the utility of the signaled-outcome procedure as it relates to gambling behavior include the investigation of whether the signaling affects cognitive appraisals of the likelihood of a risky-win. Linnet, Frøslev, Ramsgaard, Gebauer, Mouridsen and Wohlert (2012) demonstrated that while experienced poker players had accurate probability estimation skills, inexperienced poker players and pathological poker players had distorted estimation of probabilities. Nonhuman primates have been shown to share similar cognitive biases with humans (e.g., Santos & Rosati, 2015) and it might be the case that the reward-signal may bias preference by modifying cognitive appraisals of the likelihood of winning. For example, macaques can show confidence in the likelihood that a choice will result in reward or not by “betting” on their choice (i.e., confident bets makes correct/incorrect outcomes greater in magnitude, hedged bets results in reduced rewards for correct responses and small consolation reward for incorrect responses; Middlebrooks & Sommer, 2012). Alternatively, chimpanzees demonstrated confidence in receiving a reward from making a discrimination response by moving to a distally located pellet dispenser before any feedback about their response was given, so that they could collect a pellet that would otherwise be lost. At the same time, they demonstrated lack of confidence by not traveling to the dispenser to try and retrieve a pellet (Beran, Perdue, Futch, Smith, Evans, & Parrish, 2015), and these “confidence movements” aligned consistently with objective task performance. In the present case it might be that monkeys would show more confidence in a gambling win under the signaled condition than in the unsignaled condition. Alternatively, they may make equal judgments of a likelihood of a risky-win between both conditions, but favor the risky option in the signaled condition - regardless. Future research using the procedure utilized here with macaques may demonstrate that a signaled-reward condition (akin to the present experiment) also affects judgment about the likelihood that a risky choice will result in a win.