Differential Responding by Rhesus Monkeys ( Macaca mulatta ) and Humans ( Homo sapiens ) to Variable Outcomes in the Assurance Game

Behavioral flexibility in how one responds to variable partner play can be examined using economic coordination games in which subjects play against a variety of partners and therefore may need to alter their behavior to produce the highest payoff. But how do we study this behavioral flexibility once players have settled on a response? Here, we investigated how responding by rhesus monkeys (Macaca mulatta) and humans (Homo sapiens) playing a computerized single-player version of a coordination game, the Assurance game, changed as a function of the variable responses (Stag/Hare) generated by multiple simulations (SIMs). We were interested in whether individuals could track and differentially respond to changing frequencies of Stag and Hare play by the SIMs, especially with regard to the payoff dominant (Stag-Stag) outcome, something that could not be done with real partners as they quickly settled on the Stag response. For both monkeys and humans, there was a linear relationship between proportion of Stag play by the subject and the likelihood of the Stag choice by the SIM such that both species increased their use of Stag as the SIM increased its use of the Stag response. However, humans more closely matched their proportion of Stag responses to that of the SIM, whereas monkeys adopted a different, but equally effective, strategy of exploiting the higher-paying Stag alternative. These results suggest that monkeys and humans demonstrate sensitivity to a dynamic game environment in which they encounter variable contingencies for the same response options, although they may employ different strategies to maximize reward.

What is key, then, is flexibility; ideally individuals should be able to change their strategy depending upon their partner's pattern of play.This can be examined using game-theoretic coordination games in which subjects play against a variety of partners or in different situations and therefore might need to flexibly alter their behavior to achieve the outcome with the highest payoff.Adapting to changing contingencies for one's response options can help us understand how one might then adapt to changing play by one's partner or opponent, or to different patterns of response due to environmental changes.In the present study, we utilized this logic to explore flexibility of responses in an economic game.We investigated how rhesus monkeys (Macaca mulatta) and humans (Homo sapiens) responded to variable patterns of play produced by simulations (hereafter SIMs) that each employed a different response pattern in the Assurance game, a coordination game.
A normal form game in economics has three elements: the set of players, the set of actions or strategies (played simultaneously), and payoff functions that map the actions of players into payoffs.The Assurance game is a two-player, two-strategy normal form game. A pair of actions, one for each player, is called a Nash equilibrium if both actions are a payoff-maximizing response to the other.There are two Nash equilibria in the Assurance game, one of which is payoff-dominant, meaning that no other Nash equilibrium yields a higher payoff.The second Nash equilibrium is risk-dominant, meaning that there is less risk (or, as in this case, no risk) to a player if the other player deviates from the Nash equilibrium.The oft-studied Prisoner's Dilemma game differs from the Assurance game in that its payoff functions result in one Nash equilibrium that is (payoff) dominated by a non-equilibrium outcome with higher payoffs for both players.
Nonhuman primates (hereafter primates) make an interesting comparison group for understanding the evolution of choice behavior in social contexts because they typically comprise large and flexible social groups in which there are many opportunities for both affiliative and aggressive behaviors (Smuts, Cheney, Seyfarth, Wrangham, & Struhsaker, 1987).Humans and other primates also share aspects of several of the same decision-making characteristics, including negative responses to inequity (Brosnan & de Waal, 2003), loss aversion (Chen, Lakshminarayanan, & Santos, 2006), and an endowment effect (Brosnan et al., 2007;Lakshminarayanan, Chen, & Santos, 2008).Moreover, evidence indicates that primates are flexible in their social decision-making, and in particular are sensitive to their partners' identity and behavior in cooperative contexts.For example, chimpanzees prefer tolerant over non-tolerant partners in a cooperative task (Melis, Hare, & Tomasello, 2006).Capuchin monkeys (Cebus) prefer to cooperate and share rewards with kin (de Waal, 1997), with those conspecifics that have previously reciprocated in a cooperative act (de Waal & Berger, 2000;de Waal & Davis, 2003), and with those who do not monopolize access to a better reward (Brosnan, Freeman, & de Waal, 2006).There is also evidence that rhesus monkeys are sensitive to partner identity and rank among their own social groups (e.g., Gouzoules, Gouzoules, & Marler, 1984).Given these findings and others, primates should be sensitive to different types of partners, flexibly altering their behavior in order to maximize their own payoffs.Sensitivity to changes in a partner's behavior or to different response patterns among various partners increases the potential for maximizing one's payoffs because one can decrease time spent (and wasted) with partners that are likely to defect or in situations in which payoffs are not maximized by cooperating.Nonhuman primates, therefore, can provide important insights into human economic decision-making by participating in dynamic game situations that are similar to those that typically are used with humans (c.f.Kalenscher & Van Wingerden, 2011).
We previously investigated performance in the Assurance game among four primate species: humans (Homo sapiens), chimpanzees (Pan troglodytes), rhesus monkeys (Macaca mulatta) and tufted capuchin monkeys (Cebus apella; Brosnan et al., 2011Brosnan et al., , 2012)).Also known as the Stag Hunt game, the Assurance game often is used to model social interactions in the form of a simple 2x2 normal-form game of mutual coordination in which payoffs are contingent upon two players' responses such that they must coordinate their decisions to maximize rewards (Skyrms, 2003).Each individual chooses between a Stag or Hare response.Coordinating on Stag results in the payoff-dominant Nash equilibrium as it yields the maximum possible payoff for each individual; following this, any outcome other than Stag-Stag results in a lower payoff for both players.Coordinating on Hare results in the risk-dominant Nash equilibrium as it avoids a zero-payoff; however, it results in a lower payoff for both individuals than if they had coordinated on Stag.In a mixed Stag-Hare outcome, playing Stag results in nothing while playing Hare results in the lower payoff.Thus, there is an incentive to coordinate actions on either response, and especially on Stag.This game also provides a relatively species-fair approach to examining the evolution of decision-making across the primates because one can hold methodologies (e.g., instructions, experience, and payoff delivery) as consistent as possible across all species tested to provide an equivalent experimental approach.
In the initial study with humans, chimpanzees, and capuchin monkeys, we found that humans outperformed both nonhuman primate species in that a higher percentage of pairs coordinated on Stag (humans: 5/26 exclusive pairs; chimpanzees: 2/14 non-exclusive pairs; capuchins: 1/6 non-exclusive pairs, Brosnan et al., 2011).Thus at least some individuals from all species achieved this outcome, suggesting continuity in decision-making outcomes in this coordination game, as well as individual differences in decision-making in all species tested.In this first experiment, it was possible for subjects to see each other's choices prior to making their decision.Thus, it was possible that decisions for all species were improved by the availability of exogenous cues (i.e., information about the partner's choice) before a player had to make his or her own choice.The simplest way for this to occur was through subjects matching the response of the partner.To test this further, we utilized a computerized version of the task in which we compared outcomes in a condition in which choices were revealed as they were made to a condition in which choices were hidden until both partners had selected an option.Rhesus monkeys' outcomes were more similar to humans' than either of these species' outcomes were to capuchin monkeys', with both humans and rhesus monkeys readily achieving the payoff dominant outcome in the absence of exogenous cues whereas capuchin monkeys failed to do so without these cues (Brosnan et al., 2012).Thus, humans and rhesus monkeys apparently did not use a simple strategy such as matching a partner's response, although we were unable to determine how these species might respond to variable partner play, because neither species varied their response strategies in this version of the game.
Although economic games are typically played in a social environment, we know that animals can track and rapidly respond to shifting rates of reinforcement in their environments outside of a social context (e.g., Davison & Baum, 2000;Gallistel, Mark, King, Latham, 2001;Herrnstein, 1958;Kacelnik, Krebs, & Ens, 1987;Mark & Gallistel, 1994).In addition, many species show evidence indicative of probability matching, in which animals demonstrate a tight correspondence between the distribution of their responses among concurrent choices and the reinforcement rates for the given choice alternatives (e.g., Columbidae: Bullock & Bitterman, 1962;Herrnstein, 1961;Tilapia macrocephala: Behrend & Bitterman 1961;Bos taurus: Matthews & Temple, 1979;Macaca mulatta: Lau & Glimcher, 2005;Wilson, Oscar, Bitterman 1964).However, this ability to probability match may vary in tests that are designed as multi-player competitive games using computer-simulated opponents.For example, in earlier work, rhesus monkeys' selections among choice options were not in line with probability matching but instead were influenced by the choice history of the two ‗players' involved (Lee, Conroy, McGreevy, & Barraclough, 2004).One of the goals for the current study was to link these two bodies of research (e.g., reinforcement learning and decision-making), as they are relevant to our continuing interest in crossspecies performance in strategic economic games.
As stated earlier, our previous research demonstrated that rhesus monkeys could achieve the payoff-dominant outcome at a level comparable to human adults (Brosnan et al., 2012), but we know little about how comparably these two species would respond to varying strategies in the same environment.In the previous study, once the monkeys (and humans) learned that a high level of reward accompanied Stag play, they rarely, if ever, chose the Hare response.Thus, there was no opportunity to determine whether the monkeys (or humans) could flexibly adapt to a changing pattern of response by their partners.To address this, we designed a set of simulations (SIMs) for the current study that were programmed to employ a variable set of response patterns that were operationally defined as probabilities of playing the Stag response.This was an explicitly nonsocial version of the Assurance game in which we explored the relation between monkeys' and humans' patterns of choice behavior in response to different reinforcement contingencies of simulated probabilistic responders.Subjects of both species were tested alone, but in a situation in which they had previously been tested with a partner, to emphasize that they were not playing a conspecific.We anticipated that the monkeys (and humans) would easily solve the task without the presence of a partner.Additional studies will be needed to determine the degree to which the presence of a partner may influence subjects' original pattern of play, as our study was not designed to address this question.
We had two hypotheses.First, we expected that rhesus monkeys and humans would engage in differential play against each of the different SIMs, based on the monkeys' and humans' previous proficiency on this task when paired with conspecifics (Brosnan et al., 2012).Second, we predicted that this differential play against the varying SIMs would serve to allow both species to obtain more rewards than would occur with random responding, or rigid use of only one response choice (e.g., consistently playing only Stag or only Hare).We suspected that the monkeys would be somewhat more biased toward the Stag response than humans given these monkeys' prior history of predominantly choosing Stag when playing this task with real monkeys, although we also controlled for this to the degree possible by giving humans experience in playing with a real partner that was comparable to humans' experience in Brosnan et al. (2012) before engaging with the SIMs.

Subjects
Eight adult male rhesus monkeys were tested.All monkeys were housed at the Language Research Center of Georgia State University.They were moved during the testing session to a specially designed paired testing area used in the previous Assurance Game study and were returned to their normal home rooms after each testing session (Brosnan et al., 2012).Unlike in Brosnan et al. (2012) where monkeys were tested directly next to each other, and shared a computer screen on which they worked, monkeys in this experiment were tested individually and were housed alone in the -paired testing‖ room during testing.This was done to emphasize that they were playing alone in the current study, in contrast to the procedure in the previous study.In the current test, they -shared‖ the computer screen with the SIM, but, as in the previous study, they controlled their own choices on their half of the screen.They were given their normal daily diet of fruits, vegetables, and primate chow each day, regardless of the amount of work they completed during test sessions, and so they were not food-deprived or water-deprived for testing.
Thirty undergraduate students were recruited from the student body at Chapman University, Orange, CA, USA via email.Subjects were paid $7 for showing up on time, plus their experimental earnings.Criterion for inclusion was involvement in at least one economic experiment prior to the current study as to facilitate understanding of the testing setup and payment delivery system (note that subjects were never deceived and could always keep their earnings in studies at the economics lab).Involvement in any previous normal-form games, including the Assurance game, disqualified subjects.Accordingly, subjects had not participated in our earlier work (Brosnan et al., 2012).As with the monkeys, we wished to emphasize that they were not playing another partner in the current game, but without telling them this because the monkeys could not be told.Therefore we initially tested them as a pair, then moved them down the hall to a similar set-up, but one in which they were alone in the room (see Human Procedure, below, for additional details on the humans' procedure).

Apparatus
The monkeys were tested using the Language Research Center's Computerized Test System-(LRC-CTS; described in Rumbaugh, Richardson, Washburn, Savage-Rumbaugh, & Hopkins, 1989;Washburn & Rumbaugh, 1992).This consisted of a personal computer, digital joystick, a 17 in.LCD color monitor, and a pellet dispenser.Monkeys viewed the monitor from a distance of approximately 30 to 60 cm, depending on each monkey's own preference for where in the test cage it sat as it worked.
Monkeys previously had learned to manipulate the joystick to produce isomorphic movements of a computer-graphic cursor on the screen.Contacting appropriate stimuli using this apparatus cursor brought them 94-mg fruit-flavored chow pellets delivered by a dispenser interfaced to the computer through a digital I/O board.
Human subjects were tested using a similar setup as the monkeys.The testing setup included a personal computer, a computer mouse (for testing), joystick (for pre-testing), and a 22 in.LCD color monitor.Participants were seated in front of a computer in individual study carrels and were alone in the laboratory to emphasize that they were not playing with a real person.Contacting appropriate stimuli brought them nickels.These nickels were not immediately distributed, but the monetary reward amount for each choice was displayed above the selected stimulus and added to the total also displayed on the screen.The final total was distributed to the subjects in full at the conclusion of the study.
The task.On every trial, each individual chose between the Stag and Hare response.Playing Stag led to either four units of a reward if the SIM also played Stag (Stag-Stag), but led to a zero-payoff if the SIM played Hare (Stag-Hare).Playing Hare always led to one unit of a reward, regardless of whether the SIM played Hare (Hare-Hare) or Stag (Hare-Stag).
We introduced six different SIMs.These SIMs varied in their choice of the Stag response.Five of those SIMs chose the Stag response with a probability of 0.0, 0.25, 0.50, 0.75, or 1.00 on each trial.The final SIM played a tit-for-tat (TfT) strategy, choosing whatever response the monkey or human had made on the previous trial, with the SIM randomly choosing the Stag or Hare response on the first trial of a block when that strategy was employed.Hereafter, these SIMs will be referred to as the 0.0, 0.25, 0.50, 0.75, 1.0, or TfT SIM to indicate their proportion of Stag play.
Design and procedure.The experimental task was presented on a computer monitor (Figure 1).Two icons appeared on each side of a split computer screen, one at the top and one at the bottom of the screen (randomized across trials, such that sometimes icons were in the same arrangement for both sides and sometimes they were not), and a cursor was centered between each of the icons.One of the icons represented Stag (a red square) and one represented Hare (a blue circle).These were the same stimuli used in Brosnan et al. (2012).Subjects always used the left side of the screen and the SIMs' choices were always depicted on the right side of the screen.Subjects made a choice by deflecting a joystick either upward or downward with their hand to select one of the two icons, or clicked one of those icons with the mouse, following which both icons on that side of the screen disappeared.The SIM also made its selection of one of the two icons (based on its pre-set play strategy; see above) one second after the stimuli appeared, and then both icons on its side of the screen disappeared.Once the subject and the SIM had both made a response, the selected icons from both sides of the screen simultaneously re-appeared in the center on the relevant side of the monitor.Thus, subjects could not know which icon the SIM selected before making their own response, and therefore could not match the SIM's play on a given trial.Both selected icons remained visible for 3 s, during which time the rewards (food pellets for monkeys; money for humans) were allocated, and then the screen went blank for 5 s before the next trial was presented.
Non-human primate procedure.Each experimental block within a session consisted of 60 trials in which the SIM played according to one of its six predetermined strategies.Following this was a 20 min inter-block interval during which the screen was blank.At the end of that interval, a new 60-trial block was presented, with a new SIM strategy implemented.This 20 min delay was a very significant event for these monkeys, and functioned as a signal that a major change had just occurred.These monkeys typically have continuous access to the computer system, and they play different computerized games throughout the day to provide data for different research teams.Changes in the nature of the tasks in which they are engaging often are signaled by the completion of one task during one test session and a delay before the next task is presented.During this time, the screen is either blank or returns to the desktop, and there is no way the monkey can interact with the screen.We made use of the monkeys' extensive experience with this situation, and this 20 min inter-block delay functioned much like completion of a task in other circumstances.We did not more overtly cue the monkey about the potential change in how the SIM would operate in a new block (e.g., by changing the background color) because this would have been a discriminative cue that was not available in Brosnan et al. (2012).The six SIM strategies were exhausted within each 6-block run of the program in a daily session, with strategy order within those blocks randomly determined.Monkeys worked on the task for between 4 and 6 hr, once per day, and usually once per week, completing as many blocks as possible given their motivation to engage the task.
When monkeys ended testing one day and began on another day, a new randomized SIM order was established for that session.Monkeys continued to be tested until they had completed five blocks against each of the six possible SIMs, and for each SIM only the first five blocks against that SIM were included in the analyses.Although blocks were set at 60 trials, sometimes a test session ended in the middle of a block, but these data still were counted in the analyses and included as one of the five blocks against that particular SIM.Because of a computer malfunction, two sessions against the 0.0 probability SIM were lost for one monkey (Chewie), and so only three sessions against that SIM were included in the analyses.Human procedure.Humans were given a procedure that was analogous to that of the rhesus monkeys' procedure to the degree possible.Because the monkeys all had previous experience finding the Stag outcome, it was critical that the humans also have previous paired experience with the game.Thus, prior to engaging the SIM procedure described above, each pair of humans completed a pre-testing session in which they worked for coins that were deposited from a coin dispenser (analogous to the primates' pellet dispenser) in the same 4:1 ratios using the same procedures as in Brosnan et al. (2012).Humans kept all of the money that they made from the testing session (including both the paired and the SIM procedures), as well as their $7 payment for showing up on time.
As with our earlier studies, we limited the humans' instruction on the task to the bare minimum in order to more closely mimic the experience of the monkeys, who had to figure out the contingencies of the task from interacting with it.Our previous work demonstrated that this worked well with humans and provided valuable results for comparison.The humans' only instruction was limited to the following points asked prior to participation:  Have you participated in an economic experiment before?(Players had to reply with a ‗yes' to participate.) In this experiment, you will be making decisions using a joystick attached to a computer. As the experiment progresses, you may be paid in dimes by the machines next to your computer. Please collect the coins in the cups provided so as to not clog up the machines. These are the only instructions you will receive in the experiment.Once the experiment begins, the experimenter will not be allowed to answer any questions until the experiment is over. Do you have any questions before the experiment begins?
Phase 1: Pre-Test Experience with Human Partner.Human participants first played a 40-trial experimental block with a human partner.This experience playing with a human partner was necessary to provide some degree of real-partner experience comparable to that of the rhesus monkeys that had experience playing the Assurance game with a monkey partner in earlier studies (Brosnan et al., 2012).Because of computer malfunction, one pair of humans completed 39 trials instead of 40 trials.
Partners were seated next to one another at a single computer and each had access to their own digital joystick.Contacting appropriate stimuli brought them coins delivered by a coin dispenser.The computerized procedure was identical to the monkey task; however, the first two trials of this pairedsession involved familiarizing the humans with making responses and controlling the cursor on their respective sides of the screen.In these first trials, humans were required to twice move the cursor to a large green rectangle, on the first trial with no time constraint and on the second trial within 15 seconds (once at the top and once at the bottom of the screen before the actual test started).Monkeys did not do this because they already were experienced with this task from Brosnan et al. (2012).
Phase 2: Human versus SIM Testing.Next, participants moved to individual test stations to complete their blocks against the different SIMs.This physical re-location was important to emphasize to the humans that they were no longer playing with a human partner, a feature that was emphasized to the monkeys by moving them to their -paired testing‖ location but then testing them without a partner and without any other monkey in the same room with them.Now, participants used a mouse to select stimuli rather than a joystick.
Each experimental block consisted of 40 trials in which the SIM played according to one of its six predetermined strategies.Fewer trials were used so that humans would have time to complete playing against each SIM within one testing session, and to match the number of trials humans received in each session in Brosnan et al., 2012.After this first block, a 30-second inter-block interval occurred during which the screen was blank.At the end of that interval, a new 40-trial block was presented, with a new SIM strategy implemented.The SIM order within those blocks was randomly determined without replacement until all six SIM strategies were exhausted.No human participated in more than one human-pairing or played more than once against each SIM.Participants could cash in all experimental earnings for larger bills at the conclusion of the study.When playing the SIMs, one unit of reward was 5 cents.

Results
The current human participants performed similarly to the humans from Brosnan et al. (2012) when they were seated next to and played with a real human partner.In that study, 81.48% (22 of 27) pairs ultimately achieved the payoff-dominant Stag-Stag outcome, meaning that both players selected the Stag response in at least 7 of the last 10 trials.In the current study, 66.67% (10 of 15) pairs ultimately achieved this payoff-dominant outcome.This frequency of pairs achieving the Stag-Stag outcome did not differ significantly across the two studies, Χ 2 (1, N = 42) = 1.17, p = 0.28.
Figure 2 provides an illustration of the overall selection patterns of each species for each of the SIMs that were presented.This figure includes all trials against each SIM in the calculation of the mean choice percentages for Stag.However, our assumption was that performance early in the block likely differed from performance late in the block as subjects learned more about the outcomes of their Stag and Hare play against each of the SIMs.For each probabilistic SIM, these data were therefore divided into the first 20 trials versus the last 20 trials in the 60-trial block for monkeys or the first 15 trials versus the last 15 trials in the 40-trial block for humans.The resulting data are presented in Figure 3.We used a mixed model ANOVA to determine the potential impact of SIM type, species, and trial block on choice behavior measured as selection of the Stag response.We excluded the TfT SIM from these analyses as it was qualitatively different from the linear progression of the remaining SIMs (however, we included this SIM in Figure 2 for a visual depiction of performance).Trial block was included as a factor so that we could examine performance as a function of early and late trials within blocks, as defined above.There was a main effect of SIM, F(4, 144) = 29.50,p < 0.001, η 2 = 0.45, and species, F(1, 36) = 9.50, p = 0.004, η 2 = 0.21, but not of block, F(1, 144) = 2.77, p = 0.11, η 2 = 0.07.The two way interaction of SIM and species was significant, F(4, 144) = 12.70, p < 0.001, η 2 = 0.26, indicating that the effect of SIM on strategy differed between monkeys and humans.The two way interaction of SIM and block was significant, F(4, 144) = 10.80,p < 0.001, η 2 = 0.23, indicating that performance differed as a function of experience against the SIMs.There was not an interaction of species and block, F(1, 144) = 0.0, p = 0.99, η 2 = 0.00.There was no three-way interaction, F(4, 36) = 0.98, p = 0.42, η 2 = 0.027.
Because of these two-way interactions for both species, we next examined the simple main effects to determine where differences in choice behavior occurred for each SIM and for each block.For monkeys, Stag play decreased from the early to late block for the 0.0 SIM: t(7) = 2.83, p = 0.025 and for the 0.25 SIM: t(7) = 2.76, p = 0.028.However, Stag play increased from the early block to the late block for the 0.75 SIM: t(7) = -3.64,p = 0.008.Choice behavior did not differ from the early to late block for the 0.50 SIM: t(7) = -1.36,p = 0.22 or for the 1.0 SIM: t(7) = -1.74,p = 0.13.For humans, Stag play decreased from the early to late block for the 0.0 SIM: t(29) = 6.24, p < 0.001 and the 0.25 SIM: t(29) = 2.33, p = 0.027, but Stag play increased from the early to the late block for the 0.75 SIM: t(29) = -3.07,p = 0.005 and the 1.0 SIM: t(29) = -2.67,p = 0.01.Stag play did not differ from early to late block for the 0.50 SIM: t(29) = 1.31, p = 0.20.
A visual examination of the proportion of Stag play for monkeys and humans for the last block of trials across the various SIMs indicated that humans more closely approximated the probabilities of Stag responding by the SIMs than did the monkeys.Correlational analyses confirmed this.The correlation of the proportion of Stag play by humans compared to the SIM probabilities of Stag play was statistically significant, r(3) = 0.991, p = 0.001, while for the monkeys it was not, r(3) = 0.853, p = 0.07.The proportion of Stag play in the SIM accounted for 98% of the variance in the human proportions of Stag play, whereas for monkeys it accounted for only 73% of the variance.Thus, although both species varied their proportion of Stag play in line with that of the SIMs by the end of trial blocks, humans came much closer to matching the probabilities of Stag responding by the SIMs.
Finally, we examined overall efficiency in playing the Assurance game by calculating the mean number of rewards (pellets for monkeys and nickels for humans) earned per trial for each SIM.This measure provided a means of assessing whether either of the two patterns, that seen in monkeys (a strong early bias towards Stag play with only moderate shifts to greater Hare play for some SIMs) or that seen in humans (an increasingly closer approximation of probability matching from early to late blocks), was more lucrative in generating reward.In fact, they were virtually identical in their reward outcomes.Across all SIMs (excluding TfT), human players averaged 1.98 nickels per trial and monkeys averaged 1.91 pellets per trial.An independent samples t-test confirmed that this difference was not statistically significant, t(36) = 0.57, p = 0.57.
Modeling.The payoff structure of the Assurance game impacts the degree to which Stag choice is the best response for a subject to play against a probabilistic SIM.To demonstrate this (it can also be established analytically), we modeled the performance of different hypothetical players that were faced with five of the SIMs that had been presented to the monkeys (0.0, 0.25, 0.50, 0.75, and 1.00 Stag-playing SIMs).We left out the TfT strategy as that SIM play varied depending upon the subject's choice.Each of these hypothetical players chose Stag with a different probability, ranging from 0.01 to 1.00, with a total of 100 players at each of the 100 probability levels.These hypothetical players completed 300 trials with each of the five SIMs, in the same way that the monkeys had.The model determined, on each trial, the likelihood of the player choosing Stag or Hare (against all possible probabilities of Stag play) and the likelihood of the SIM choosing Stag or Hare (based on its pre-set strategy) and then determined the contingencies (number of rewards earned) for that trial.These data are shown in Figure 4 as the average number of rewards obtained against each SIM by all 100 hypothetical players at each probability of Stag play on the part of the players.
A few critical things emerge from this model that could explain the bias towards Stag play shown by the real monkeys.First, it is clear that, when the SIM played Stag with probability > 0.25 (i.e., the 0.50, 0.75, and1.0SIM), then the best response was to play Stag 100% of the time.If the SIM played Stag with probability < 0.25 (i.e., the 0.0 SIM), then the best response was to play Hare 100% of the time.When the SIM played Stag with a probability of 0.25 there was no best response for a player because the 4:1 payoff ratio meant that any probability level of selecting Stag (or Hare) would lead to the same expected number of pellets, as seen by the flat slope in Figure 4 for the 0.25 SIM.Thus, for five of the six SIMs pitted against the monkeys and humans, a rule such as -play Stag every time‖ would have been the best response (0.50, 0.75, 1.0, and TfT SIMs) or would not have been payoff dominated (against the 0.25 SIM).Because a SIM with 0.0 probability of playing Stag occurred only one time in every six blocks of trials, there was actually little cost overall to playing Stag a high proportion of time across all SIMs.Although playing Hare 100% of the time would have resulted in the maximum possible payoff against the 0.0 SIM, this is still a relatively low payoff as it can only lead to one unit of reward.The real risk to the player is in not choosing Stag against the remaining SIMs that played Stag with probability > 0.25, because in all of those cases playing Hare in any proportion of trials greater than zero was worse than playing Stag exclusively.
So if the preference for Stag was so effective, and more so than probability matching against some SIMs, how did the humans and rhesus monkeys end up with virtually identical earnings per trial?While a bias towards Stag play netted benefits in the 0.50 and 0.75 SIMs, players who probability matched would, on average, net the same number of rewards per trial because they would make up for the lost rewards against the 0.0 SIM, where they largely played the Hare response.Thus, the payoff matrix affords an equal advantage to a player who is biased to choose Stag against a changing partner (as the monkeys did) and a player who matches their partner's probability of choosing Stag (as the humans did).

Discussion
In the present study, we investigated how responding by rhesus monkeys and humans in the Assurance game changed as a function of variable simulations of probabilistic play.We were interested in whether individual humans and monkeys could track and differentially respond to simulations that varied in the frequency with which they made the payoff-dominant response (i.e., Stag).For both rhesus monkeys and humans, there was a significant difference in the proportion of Stag play as a function of which SIM was active.Specifically, there was a linear relationship between the proportion of Stag play by the subject and the likelihood of Stag play by the SIM so that monkeys and humans increased their use of the Stag response as the SIM increased its use of the Stag response.These results indicate that subjects of both species changed their behavior in response to the immediately prevailing reinforcement contingencies and did so in a way that better matched the payoff outcomes of the different SIMs as compared to just responding at chance or just using one response option (Stag or Hare).
Although individuals of both species were apparently sensitive to the different probabilities of Stag responding by the SIMs, humans more closely aligned their proportion of Stag responses to that of the SIMs.Monkeys adopted a different, although equally effective, strategy of exploiting the higherpaying Stag alternative.Interestingly, both species changed their responses within trial blocks as they learned more about the likelihood of Stag-play by each SIM so as to optimize their outcomes over the course of a block.Humans aligned their use of the Stag response in accordance with the SIMs early on, but came to even more closely match the SIMs' proportion of Stag use by the end of the block.Monkeys began each block of trials with a high-Stag bias (approximately 80%) and, as a function of experience with the reinforcement contingency of the active SIM, adjusted their use of the Stag response in accordance with the respective SIMs.We modeled these response patterns and found that both humans' and rhesus monkeys' strategies were equally efficient because there was little cost overall to the rhesus' strategy of playing Stag against all of the different SIMs.For four of six SIMs (1.0, 0.75, 0.50, and TfT SIM), the best strategy was 100% Stag play.For a fifth SIM (0.25 SIM), it made no difference, and it was only against the 0.0 SIM that playing Stag guaranteed a loss of potential rewards whereas 100% Hare play was the best response.
The current findings are interesting in light of the literature on probability matching and optimal response strategies.When faced with repeated choices between two options that produce reward at different probabilities, humans often distribute their responses to each alternative in proportions that are roughly equivalent to the attendant payoff structure (see Baum, 1979;Vulcan, 2000, for reviews), and that is what the humans in the present study did.However, game theory predicts that individuals should allocate all of their responses to the higher-paying alternative, rather than distributing responses across the two choices (Edwards, 1961;Herrnstein & Loveland, 1975;Shanks, Tunney, & McCarthy, 2002), which was closer to the pattern of behavior shown by our monkeys.In a different study in which rhesus monkeys played a competitive game against computerized simulations, the monkeys' selections were not in line with probability matching but instead were influenced by the choice history of the SIM (Lee et al., 2004), much like the current set of results.Other studies have demonstrated this same effect, noting a difference in nonhuman animal versus human performance on related tests.For example, pigeons distributed a higher proportion of their responses to the higher-paying alternative than did humans in procedurally equivalent tasks (Goodie & Fantino, 1995, 1996;Hartl & Fantino, 1996).Some research teams have reported that nonhuman animals appear to respond in a way that better maximizes payoffs than humans to probabilistic game-like tasks such as the Monty Hall Dilemma (e.g., Herbranson & Schroeder, 2010), although this is not always the case (e.g., Klein, Evans, Schultz, & Beran, 2013;Mazur & Kahlbaugh, 2012).
Here, the monkeys had previous experience with the Assurance game, from which they may have retained a preference for the Stag response (all monkeys achieved and maintained the payoff-dominant outcome of Stag-Stag in the earlier experiment; Brosnan et al., 2012).This prior experience with the reinforcement rate of the payoff structure coupled with the Stag response's high overall likelihood for payoff could have predisposed the monkeys to a Stag-biased response strategy in comparison to the more variable responding by humans, who did not have as much prior experience with the Assurance game as did the monkeys.However, we paired the current human participants with real partners to provide some level of experience equivalent to that received by humans in Brosnan et al. (2012) prior to beginning the SIM play.Another possibility is that monkeys may tend toward a more exclusive preference of one response option in discrete choice tasks, such as the Assurance game, than do humans (see Mazur, 2010), although they certainly varied that preference to a statistically significant degree across the different SIMs, and they varied their use of the Stag response within blocks of trials against the different SIMs as they learned about the outcomes of their own responses.
In past research examining human probability matching with dichotomous choice paradigms, adult humans gradually shifted away from probability matching and increased their proportion of choices of the option with the higher individual probability of reward as a function of experience on the task (e.g., Edwards, 1961;Fantino & Esfandiari, 2002;Goodie & Fantino, 1999).Thus, humans chose the more richly paying option as they performed the task longer, and so in our study, humans also might have increased their use of the richer Stag response against the SIMs in the current paradigm if they had been given more trials.Additional research in which humans are given the numbers of trials that these monkeys had would be necessary to determine whether humans might come to abandon probability matching and instead adopt the higher paying Stag-playing strategy.Such data would be helpful in further defining the similarities (and dissimilarities) between these species in their choice behavior.
What remains to be seen in these kinds of tasks is the degree to which primates rely on and demonstrate the social behaviors (and cognitive processes) that are evident in other kinds of social tests versus relying solely on reinforcement contingencies when social partners are present.Whether primates are sensitive to the social context of games in which they are faced with more complex decision-making scenarios should be more fully investigated.Decision-making by social animals rarely occurs in a static, isolated environment, but instead takes place in a rapidly changing and highly variable setting in which one must be flexible in responding to fluctuating situations and individuals.The present results offer some preliminary insights into the nature of responding in the pursuit of optimizing one's own benefits in complex game environments.Future work building on these kinds of economic games can more fully vary the social context that would elicit differential response patterns (and even strategies) of nonhuman players.

Figure 1 .
Figure 1. Outline of one possible trial.A) Initially, both sides of the screen present the Stag (rectangle) or Hare (Circle) stimuli, with random presentation of one or the other at top of screen.B) The player can choose at anytime, and here the player would choose first, at which point that side of the screen (left) goes blank.C) The SIM makes its response 1 second after the choice stimuli appear, and then its side of the screen (right) goes blank.D) Both selections then are shown at center of the screen and the player is rewarded (or not) depending on the contingency in place for that particular outcome (here, it is Stag-Stag).

Figure 2 .Figure 3 .
Figure 2. Mean percentage of trials on which the monkeys (light gray bars) and humans (dark gray bars) selected the Stag response as a function of the SIM strategy with which they were playing.Error bars indicate 95% confidence intervals.

Figure 4 .
Figure 4. Results of modeling the reward outcomes for different play strategies against different SIMs. 100 hypothetical players each completed 300 trials at each of 100 probabilities of choosing Stag against each SIM that was used with the monkeys (excluding TfT).The mean number of pellets obtained by those 100 players is shown for each probability level for selecting Stag by those players.