Prosociality, social tolerance and partner choice facilitate mutually beneficial cooperation in common marmosets, Callithrix jacchus

a Human Ecology Group, Institute of Evolutionary Medicine, University of Zurich, Zurich, Switzerland b Department of Cognitive Biology, University of Vienna, Vienna, Austria c Department of Anthropology, Miami University, Oxford, OH, U.S.A. d Department of Anthropology, Emory University, Atlanta, GA, U.S.A. e Faculty of Social Sciences, Anthropology, University of Helsinki, Helsinki, Finland f Animal Ecology Group, Department of Biology, Utrecht University, Utrecht, The Netherlands

While the importance of social motivations for cooperative behaviour has been thoroughly demonstrated in human research (Fehr & Gintis, 2007;Kurzban, Burton-Chellew, & West, 2015;Tomasello, Carpenter, Call, Behne, & Moll, 2005), their role in explaining nonhuman animal (herein 'animal') cooperation remains controversial and less well established. This disconnect is due in part to the common assumption that psychological mechanisms do not constrain the expression of adaptive animal behaviour, which has been dubbed the 'behavioural gambit' (Fawcett, Hamblin, & Giraldeau, 2013). Despite the utility of this assumption for behavioural ecological research, studies have consistently shown that psychological processes such as motivation, attention and learning play a key role in explaining otherwise unexpected patterns of decision making and behavioural plasticity in both the laboratory and the wild (e.g. Clark & Dukas, 2003;Kedar, Rodriguez-Giron es, Yedvab, Winkler, & Lotem, 2000;McNamara & Houston, 2009). Understanding the structure and effects of these proximate mechanisms is therefore crucial for explaining the diversity of behavioural processes within and across animal species, which may otherwise be unaccounted for by evolutionary ecological theory alone.
One of the most well-studied social motivations in human and nonhuman primates is prosociality, which is defined as a motivation for benefitting or helping social partners, also known as an 'other-regarding' preference. Formally, prosociality is modelled as a parameter controlling the degree to which an individual weighs the observable payoffs of their social partners in their decision-making processes (Akçay et al., 2009). Prosocially motivated individuals are those who positively weigh the payoffs of others in their decisions and thus, all else being equal, are more likely to exhibit actions that increase the observed benefits accrued to social partners. Given that such motivational parameters cannot be directly observed, prosociality must instead be inferred from empirical variation in individuals' expected probability of acting to help or benefit others, as observed under standardized experimental conditions (Burkart et al., 2014;Jaeggi et al., 2010;Massen, Haley, & Bugnyar, 2020). This experimental approach to investigating prosociality is also employed in the present study.
Importantly, because prosociality is hypothesized to specifically regulate the weighing of social partner payoffs, it is expected that prosociality will increase the probability of exhibiting any cooperative behaviour that an animal understands as providing observable benefits to its social partners (Burkart, Hrdy, & van Schaik, 2009). Formal models suggest that generalized social motivations such as prosociality are most likely to evolve in socioecological contexts that generate mutually beneficial outcomes, as well as those that select for complementary action and role specialization, such as in contexts of group hunting, resource defence and cooperative offspring care (Akçay et al., 2009). Consistent with these models, prosociality has been hypothesized to be a key proximate mechanism promoting the initiation and maintenance of cooperation across primate societies (Burkart et al., 2014;Callaghan & Corbit, 2018;Silk, 2007), as well as a central target of selection within social systems reliant on costly forms of helping behaviour and social coordination, such as the societies characteristic of Pleistocene hominins (Isler & van Schaik, 2012;Martin, Ringen, Duda, & Jaeggi, 2020) and other cooperatively breeding animals (e.g. Horn, Scheer, Bugnyar, & Massen, 2016;Massen et al., 2020). Support for the generalized role of prosociality in regulating cooperative behaviour has been found across callitrichid monkeys, who often perform exceptionally well in cognitive tasks requiring social attention and coordination among group members, as compared to primates of similar brain and group size (Burkart et al., 2009). The so-called cooperative breeding hypothesis suggests that these findings can be explained by selection for prosociality to facilitate the helping behaviours characteristic of cooperatively breeding species such as callitrichids. Given that prosociality is expected to increase the probability of cooperative behaviour in general, it is also expected that prosociality will, all else being equal, positively associate with performance in any experimental task where individuals can generate food rewards or other benefits for their social partners (Burkart et al., 2014;Burkart et al., 2009;Hrdy, 2009; but see Burkart & van Schaik, 2016;Thornton & McAuliffe, 2015;Thornton et al., 2016). This hypothesis therefore predicts that individual prosocial motivation can explain both intra-and interspecific variation in outcomes such as social coordination and problem solving across a variety of cooperative domains beyond food provisioning and offspring care.
In addition to prosociality, individual differences in social tolerance and partner choice have previously been found to explain variability in cooperative behaviour both within and among a variety of primate taxa (e.g. Hare, Melis, Woods, Hastings, & Wrangham, 2007;Jaeggi & Gurven, 2013;Kaigaishi, Nakamichi & Yamadam 2019;Melis, Hare, & Tomasello, 2006;Molesti & Majolo, 2016;Sabbatini, De Bortoli Vizioli, Visalberghi, & Schino, 2012). Partner choice effects have also been shown to reflect key hormonal mechanisms regulating social behaviour and performance in cognitive tasks, such as in longtailed macaques, Macaca fascicularis, where cooperation with closely bonded individuals is accompanied by a reduction in cortisol levels (Stocker, Loretto, Sterck, Bugnyar, & Massen, 2020). Social tolerance refers to an individual's tendency to remain in proximity to a conspecific during potentially competitive situations, such as when an individual co-feeds on a limited resource near to a group member without attempting to displace them, while partner choice broadly describes the tendency of particular individuals to more frequently engage in cooperative acts together, as compared to other possible pairings within their social group. Importantly, some authors have hypothesized that prosociality may be a by-product of selection for increased social tolerance (Hare, Wobber, & Wrangham, 2012), in contrast to the predictions of the cooperative breeding hypothesis (Burkart et al., 2014). This suggests that the effects of prosociality on cooperation may be better explained by individual differences in social tolerance per se. The expression of helping behaviour is also influenced by partner preferences and relationship quality (Finkenwirth & Burkart, 2018;Martin & Olson, 2015), further suggesting that the apparent effects of prosociality on cooperation may be a by-product of partner choice in addition to social tolerance. Directly testing these predictions has remained difficult, however, due to the necessity of disentangling individual variation in these dimensions within a comparable experimental framework.
In the present study, we therefore experimentally compare prosocial motivation, social tolerance, partner choice and cooperative behaviour in common marmosets. Common marmosets are a particularly valuable model system for this investigation because they engage in extensive cooperative breeding (Digby & Barreto, 1993) and have also been found to exhibit relatively high prosociality in previous experimental studies (Burkart et al., 2014;Burkart, Fehr, Efferson, & van Schaik, 2007;Burkart & van Schaik, 2013). Moreover, marmosets have been found to coordinate their behaviour in joint action tasks (Miss & Burkart, 2018), providing the opportunity to examine how prosociality influences success in an experimental cooperative task requiring partner coordination.
To assess prosocial motivation, we tested individuals in a group service food-provisioning paradigm (Fig. 1a), which relied on a seesaw mechanism adaptation (Horn et al., 2016) of a paradigm previously tested in marmosets and other primates (Burkart et al., 2014;Burkart & van Schaik, 2013). In this paradigm, individuals could step on a platform to provision a group member on an adjacent platform without receiving a reward themselves (0/1 payoff). We further employed a cooperative pulling task known as the loose string paradigm (Hirata, 2003;Fig. 1b) to assess social tolerance, partner choice and dyadic cooperation, using an in-group testing procedure (Massen, Ritter et al., 2015). In contrast to the group service paradigm, the loose string paradigm required active coordination within a dyad, as two subjects needed to simultaneously pull on the ends of a tethered string to retrieve food rewards from a sliding platform. Given that success resulted in rewards for both partners (1/1 payoff), we interpreted this paradigm as measuring mutually beneficial cooperation. We predicted that individuals with higher prosocial motivation in the group service paradigm would exhibit higher rates of dyadic cooperation in the loose string paradigm, consistent with the hypothesized benefits of prosociality for achieving social coordination and cooperative problem solving in contexts where individuals can produce benefits for their social partners (Burkart et al., 2009). In addition, we predicted that individual social tolerance and partner choice would also influence dyadic cooperation, and we further tested whether prosociality had an independent effect on mutually beneficial cooperation after accounting for these factors. Finally, to examine the generalizability of these potential effects across individuals and social partners, we also controlled for personality differences in sociability and arousal, age, sex and food motivation, all of which have been found to affect primates' task motivation and performance in social behavioural experiments (e.g. Altschul, Wallace, Sonnweber, Tomonaga, & Weiss, 2017;Morton, Lee, & Buchanan-Smith, 2013;Wergård, Westlund, Spångberg, Fredlund, & Forkman, 2016). The group service paradigm. A see-saw mechanism was fixed to the enclosure of a social group with a provisioning platform (Position 0) extended into the enclosure, which could be stepped on to weigh the platform down e causing a food reward to roll towards the enclosure (Position 1). An additional platform not attached to the see-saw mechanism was placed within the enclosure to facilitate retrieval of this food reward. (b) The loose string paradigm. A moving board was stationed outside the enclosure of a social group, with a tethered string placed within arm's length of two platforms fixed within the enclosure. Two equally sized food rewards were placed on the board across from each platform. Simultaneous pulling on both ends of the string resulted in the board moving forward, facilitating retrieval of the food rewards. Conversely, uncoordinated pulling on either end would result in the string becoming untethered, preventing access to the food rewards.

Subjects and Housing
We tested 23 common marmosets in five social groups (13 males; 10 females; age range 0e12 years of age, group size range 3e6 individuals; see Appendix, Table A1) at the University of Vienna. Each group enclosure was encased with thin wire mesh, contained opaque barriers to prevent visual contact between adjacent groups and facilitated free movement between indoor and outdoor housing areas (indoor: 250 Â 250 Â 250 cm; outdoor: 250 Â 250 Â 250 cm), which both contained ample supply of enrichment objects and bedding material. We maintained the laboratory temperature between 24 C and 26 C, with 40e60% humidity and individual heating lamps for each group, and we employed lighting lamps on a 12:12 h light:dark cycle in addition to natural sunlight. All groups had ad libitum access to water, and monkey pellets were provided daily in addition to nutritionally balanced meals in the morning and afternoon, along with weekly enrichment activities that provided mealworms and marmoset gum.

Ethical Note
This research was approved by the Animal Ethics and Experimentation Board, Faculty of Life Sciences, University of Vienna (No. 2018-013). All housing conditions accorded with Austrian legislation as well as the Callitrichidae husbandry guidelines of the European Association of Zoos and Aquaria. We conducted our experiments within group enclosures without separating any animals from their social groups so as to prevent unnecessary stress during testing. Subjects also received regular meals during the experimental period to prevent an undesirable increase in competition for food rewards and any subsequent social stress.

Experimental Paradigms
The order of paradigm presentation was counterbalanced across groups to control for order effects. As depicted in Fig. 1, all experimental sessions were conducted in the subjects' home enclosures with the entire social group present to enhance the ecological validity of the observed responses. As a result, individuals could spontaneously pair and engage in partner choice with any group members during testing, rather than being placed into specified dyads.

Group service paradigm
Our apparatus used a see-saw mechanism, such that stepping on a provisioning platform (Position 0) would weigh the see-saw down, causing food placed on the outside to roll down towards the fence and become accessible at an adjacent receiving platform (Position 1; see Fig. 1a). Crucially, a subject who landed at Position 0 would need to leave the provisioning platform if it wanted to access the food at Position 1 for itself, which would in turn cause the see-saw mechanism to revert to its original position with the food out of reach. Consequently, an individual was only able to provide food for someone else.
A multistage habituation and training procedure (Horn et al., 2016) was implemented prior to testing to ensure that subjects understood the basic mechanics of the apparatus. This was followed by two experimental phases composed of the group service task ('test' condition), during which individuals could provision their groupmates with a food reward without receiving a reward themselves, as well as two control conditions designed to account for potential effects of stimulus enhancement and food motivation on provisioning behaviour. In the 'empty' control condition, the experimenter approached the apparatus and pretended to place food down on the Position 1 receiving tray; in the 'blocked' control condition, food was placed on the Position 1 tray, but access to the food was blocked by a transparent boundary at the receiving platform. Subjects stepping on the Position 0 platform could therefore cause the food to move closer to the enclosure but not provision their group members. Trial numbers per experimental session were determined by group size (i.e. 5 Â n) to ensure adequate opportunity for participation across subjects. Trials were a maximum of 2 min in duration. Following previous studies Burkart & van Schaik, 2013;Horn et al., 2016), we first conducted three training sessions allowing the marmosets to learn about the full contingencies of both test and control set-ups beyond the basic see-saw mechanism trained in previous phases. This was followed by two experimental sessions of each condition, which were used for assessing prosociality. The blocked control was conducted after initial testing and empty control sessions, consistent with prior research (Burkart et al., 2014;Horn et al., 2016), while the test and empty control sessions were repeated sequentially for counterbalancing. Note that previous research has shown that no order effect occurs when marmosets are retested after the blocked control using the original group service paradigm (Burkart & van Schaik, 2016), nor for azure-winged magpies using the modified paradigm employed in our study (Horn et al., 2016). Please see Appendix for more detailed testing procedures.
Consistent with prior research (Burkart et al., 2007;Burkart et al., 2014;Burkart & van Schaik, 2013;Horn et al., 2016), we considered higher rates of stepping on the provisioning platform during the test compared to the control conditions as providing necessary evidence for interpreting provisioning behaviour as prosocially motivated. Thus, for subjects who clearly differentiated when they could and could not provision groupmates, we considered higher rates of stepping on the provisioning platform to be indicative of higher prosocial motivation. Note that this is a conservative criterion intended to strengthen our inferences about the motivational basis of observed provisioning behaviour, as all individuals participated in extensive training with the apparatus and thus exhibited a basic comprehension of the paradigm prior to test sessions. Therefore, while many individuals did not step on the provisioning platform frequently across any experimental sessions, it is likely that they nevertheless understood the task and simply were not motivated to provide food (or they may have been more motivated to retrieve rather than provision food). As described further below, some subjects also used the provisioning platform more during test sessions but exhibited very little prosocial motivation overall. In contrast, very few subjects used the provisioning platform frequently in both the test and control trials, suggestive of a stimulus enhancement effect (see Appendix, Fig. A3 and Table A3). All subjects were provided standardized daily meals outside of the testing period, and we therefore also used each individual's average number of retrieved food rewards across test sessions as an approximate measure of individual food motivation.
Loose string paradigm Social tolerance assessment. Each social group was habituated to the apparatus (Fig. 1b) and underwent an initial social tolerance task used in previous research on cooperation in the loose string paradigm (Massen, Ritter et al., 2015). Two untethered strings were placed adjacent to one another on the apparatus across 18 sessions of 10 trials each, such that subjects could retrieve a food reward by pulling a string without the help of a partner. By observing how often subjects retrieved the food in proximity to each of their groupmates at the adjacent string, we were able to calculate a previously validated measure (Massen, Ritter, et al., 2015) of average individual social tolerance (see Appendix). In addition, this phase facilitated training the subjects on the basic string-pulling mechanism, as acquisition of a food reward required a subject to pull in a single string from the apparatus towards itself. All subjects succeeded in accessing food in at least 9/18 training sessions, suggesting a basic comprehension of the string-pulling component of the task.

Dyadic task
Following this procedure, each group received 40 test sessions of 10 trials using the tethered string, with a maximum 1 min duration per trial. The rewards could not be accessed during these trials without simultaneous pulling by both partners on their respective ends of the string, thus requiring dyadic coordination in the timing of their pulling. We considered trials in which subjects were able to retrieve these rewards as instances of successful cooperation. In contrast to prior studies utilizing a single clump of food rewards (e.g. Melis, 2006), we distributed food evenly on the moving platform so that both partners could retrieve equal rewards. This allowed us to more directly disentangle the effects of individual prosociality and social tolerance on task performance. Moreover, given that prosociality is formally predicted to influence decision making irrespective of personal payoffs (Akçay et al., 2009), this design allowed us to investigate the unique effect of other-regarding preferences while holding the personal benefits of action constant across individuals. Furthermore, while previous research motivated our use of three training sessions across the group service conditions Burkart & van Schaik, 2013;Horn et al., 2016), no prior studies had yet been done on the loose string paradigm with common marmosets. We therefore did not designate a fixed number of additional training sessions with the full dyadic loose string task. Rather, we provided all groups with a large number of experimental trials (400 loose string trials) to facilitate ample opportunity for subjects in all groups to learn the temporal coordination necessary for the dyadic task (see Appendix, Fig. A5). As described below, we controlled for general learning across subjects in all analyses to effectively distinguish between the independent effects of task comprehension and the factors of interest on performance in the dyadic loose string task. Please see Appendix for more detailed testing procedures. Note that two older adult subjects did not consistently participate in either the social tolerance task or the dyadic loose string paradigm during the entire study period and were therefore excluded from data analysis due to insufficient evidence of task comprehension (see Appendix, Table A1).

Personality Scores
Personality scores from two factor dimensions describing individual differences in sociability (þallogrooming, þcontact sitting, þsocial proximity) and arousal (þactivity level, þgnawing, þscent marking) were available from previous research ) on a subset of our sample (19/23 subjects). These scores were estimated using the Exploratory Graph Analysis þ Generalized Network Modeling (EGA þ GNM) statistical framework, from focal observational data collected within 3e7 months of the present study. See Appendix for further details on the structure and estimation of these personality dimensions. Note that we here refer to personality as temporally consistent amongindividual variation in behaviour, following the behavioural ecological tradition in animal personality research (Dingemanse & Dochtermann, 2013). The sociability and arousal dimensions are therefore statistical constructs quantifying stable patterns of interindividual differences observed across multiple months of behavioural data. We investigated these traits in particular as activity level is expected to predict neophilia among common marmosets (Koski et al., 2017), and previous work has shown sociability to reduce participation in experimental tasks among captive primates (Morton et al., 2013). In addition, many species exhibit increased personality similarity within both friendships and pair bonds (e.g. Gabriel & Black, 2012;Massen & Koski, 2014;Youyou, Stillwell, Schwartz, & Kosinki, 2017). This phenomenon, often referred to as homophily ('love of the same'), has been hypothesized to benefit dyadic cooperation through an enhanced capacity to coordinate and synchronize behaviour with phenotypically similar partners (Laubu, Dechaume-Moncharmont, Motreuil, & Schweitzer, 2016;Massen & Koski, 2014).

Statistical Analysis
Bayesian generalized linear mixed-effects models (GLMMs) were fitted for all analyses using the 'brms' package (Bürkner, 2017) for the R statistical environment (R Core Team, 2018). Weakly regularizing priors e b $ Normalð0; 2Þ for fixed effects, s $ Half À Cauchyð0; 2Þ for random effects, and R $ lkjð2Þ for random effect correlations e were placed on all model parameters to penalize extreme estimates and reduce our risk of inferential error (McElreath, 2016). We modelled responses in the group service paradigm by investigating subject stepping rates at the Position 0 provisioning platform. To assess whether subjects exhibited prosociality in the experiment, we estimated a binomial model with experimental condition (test, control, blocked) as a fixed effect with accompanying random slopes across subjects. We further included a fixed effect for time of day to control for variation in performance across morning and afternoon sessions, as well as random intercepts to account for any unobserved heterogeneity across social groups and observations. Cooperation in the loose string paradigm was investigated using both group level and dyad level success rates per experimental session. We assessed our central hypothesis by comparing rates of successful cooperation between dyads with and without subjects exhibiting prosociality in the group service task. The group level analysis therefore considered the total proportion of successful trials between prosocial and nonprosocial dyads within a group for each experimental session, while the dyad level analysis modelled the proportion of successful trials per experimental session for each possible prosocial and nonprosocial dyad within a social group. The total proportion of prosocial dyads within each group was controlled for in the group level analysis to ensure that the relative difference in success between prosocial and nonprosocial dyads was not confounded with the baseline probability of a prosocial dyad participating in the task. To examine the effects of partner choice across experimental sessions, we modelled dyadic cooperation in the prior session (number of successful trials/10 trials) as a fixed effect predictor of success in the subsequent session, capturing how well prior cooperation predicted future cooperation. To account for the effect of individual social tolerance on dyadic cooperation, we included an additive effect (i.e. sum of partner values) of individual social tolerance for each dyad. Additional exploratory fixed effects were also estimated for sex combination within a dyad, the additive effects of partners' age, sociability and arousal, as well as for partner similarity in age and personality e calculated on a 0e1 scale as 1/(1 þ jvalue subject 1 À value subject 2 j). We also included the additive effect of food motivation within each dyad to ensure that the primary effects of interest were not confounded by experimental motivation across tasks, as well as time of day to control for variation in performance across morning and afternoon sessions. We included random intercepts across dyads and social groups to account for the multilevel structure of our repeated measures data, as well as so-called multimembership random intercepts to appropriately handle repeated observations of individuals nested within multiple dyads (Browne, Goldstein, & Rasbash, 2001). Additional random intercepts were included to capture any unobserved heterogeneity across observations and days of experimentation independent of our fixed effects. A multiple imputation procedure (van Ginkel, Linting, Rippe, & van der Voort, 2019) was used to account for unreliable identification of two twin juvenile subjects during the loose string task. Additionally, given support for random missingness in a Bayesian imputation model, we used a simpler mean imputation procedure to account for missing personality scores in four subjects. Please see Appendix for further details on our statistical models and procedures.
Rather than relying on null hypothesis tests and discrete designations of statistical significance, we provide multiple measures to summarize and draw inferences from our posterior model estimates (McElreath, 2016;McShane, Gal, Gelman, Robert, & Tackett, 2019). In particular, to interpret the strength and uncertainty of estimated effects, we used the posterior median slope (i.e. the log odds for a 0e1 or 1 SD change;b), the median absolute deviation (MAD) as a robust measure of statistical uncertainty around the median, the 90% Bayesian credible interval (90% CI) and the probability of observing a positive or negative effect, i.e. the proportion of the posterior greater or smaller than 0 in the direction of the median (p þ or p À ). Note that in contrast to classical P values, which consider the probability of observing data under a null hypothesis pðdatajH 0 Þ, the reported p þ and p À directly estimate the probability in support of hypothesized positive or negative effects given the observed data pðH 1 jdataÞ. Larger values of p þ or p À therefore indicate greater support for positive or negative effects, respectively. In addition, we calculated posterior median and MAD estimates of Cohen's d for our fixed effects, which provides a standardized mean difference effect size with values of 0.2, 0.5 and 0.8 traditionally interpreted as small, medium and large effects, respectively. Finally, to avoid overfitting and enhance model generalizability, we used the fully Bayesian WatanabeeAkaike information criterion (WAIC; Gelman, Hwang, & Veharti, 2014) to select a final model of dyadic cooperation. In particular, we first estimated a model with all parameters included and compared it to a reduced model without fixed effect parameters exhibiting highly uncertain effects (p þ or p À < 0.90). As with other information criteria, DWAIC full-reduced model ¼ À2 provides minimal support for the full model. Note that this procedure did not meaningfully change inferences about the primary effects of interest (see Appendix, Table A4).

Group Service Paradigm
Prosociality is evidenced by higher rates of stepping on the Position 0 platform during test trials, when groupmates could be provisioned, as well as relatively lower rates of stepping on the platform during the empty and blocked controls that assessed potential stimulus enhancement effects. Rates of stepping on the Position 0 provisioning platform were moderate to high across all conditions (average rate: 88% test trials, 37% empty control, 52% blocked control; see Appendix, Fig. A2). On average, the marmosets used the Position 0 platform more during test trials than during empty control trials, in which no rewards were placed on the see- , but there was a high degree of uncertainty in the difference between the test and blocked control sessions, in which rewards were placed on the see-saw yet access to these rewards was blocked . However, the stepping rates of 7/23 subjects provided clear evidence of differentiating between conditions with and without opportunities to provision group members (i.e. more use of the Position 0 platform in test trials than both controls; see Appendix, Fig. A3 and Table A3). Of these seven individuals, three nulliparous adult females, one subadult male and one subadult female exhibited high levels of prosociality (see Fig. 2). These 5/23 subjects engaged in appreciably higher rates of provisioning than the other two subjects who evidenced clear understanding of the task but had low test session provisioning rates overall A4). We therefore considered these five subjects in particular to exhibit strong evidence of prosocial motivation.
Overall, these prosocial subjects successfully provisioned their groupmates between 24% and 70% of test trials. As expected, there was a resultant negative association between food received during testing and prosociality (r biserial ¼ À0.35). Given that these subjects differentiated between conditions in which their groupmates could and could not receive rewards, this further suggests that these individuals were not merely motivated by the expectation of reward at the apparatus after being provisioned themselves, but were instead motivated to provision their group members. Given that the loose string paradigm is intrinsically a dyadic task, we therefore classified all dyads containing at least one of these prosocial subjects as being prosocial dyads (N ¼ 14/37 prosocial dyads, N ¼ 23/ 37 nonprosocial dyads; 3e15 total dyads per group).

Loose String Paradigm
Successful cooperation in the loose string paradigm increased with subsequent experience across experimental trials at the group  Fig. A5). This indicates that the task required sustained learning among partners beyond the initial single string social tolerance task and further demonstrates that, on average, all dyads became better at coordinating their pulling in the task over time. Independently of these learning effects, and consistent with our central hypothesis, dyads containing prosocial subjects also contributed to an appreciably higher proportion of successful cooperation trials per group session ( When comparing all possible dyads within each group, dyads containing a prosocial subject continued to exhibit a higher average probability of successful cooperation than those without a prosocial Fig. A6). This suggests that the prosociality effect is not due to specific prosocial dyads within each group driving the overall difference in dyad types across group sessions. Indeed, this dyadic prosociality effect remained after controlling for partner choice (as indicated by the association between dyadic success in the previous session and success in the current session;b ¼ 0.  . 4e). This demonstrates that while the pertinent factors of partner choice and individual social tolerance also influenced cooperation, prosociality nevertheless had an independent positive effect on average cooperative success across social partners. In addition, individuals of similar age also achieved higher rates of success Fig. 4c). Our final model excluded parameters for personality, total dyadic age, sex combination and time of day, as model comparison suggested that these highly uncertain effects did not enhance model quality (DWAIC full e reduced model ¼ 0.00 [SD ¼ 1.30]).

DISCUSSION
We built on previous research by directly testing whether prosociality, social tolerance and partner choice facilitate individuals' capacity to achieve social coordination and problem solving in a mutually beneficial cooperative task. While suggestive, previous attempts to address the role of prosociality in animal cooperation have been limited by the employment of distinct behavioural measures and a reliance on interspecific comparisons, preventing an unambiguous assessment of the benefits of prosocial motivation for cooperation among conspecifics. We found that intraspecific variation in prosociality had a positive association with coordinated cooperation across social partners in general, independently of social tolerance and partner choice. These findings are consistent with the cooperative breeding hypothesis, which predicts that the prosocial motivation underlying costly helping behaviour also increases the probability of achieving coordination and problem solving across cooperative contexts more generally , in addition to other mechanisms such as partner choice and social tolerance (Burkart et al., 2014). It is important to note that we did not examine performance in a task for which cooperative behaviour resulted in clear individual costs. We therefore cannot confidently infer that prosocial motivation as measured in the group service paradigm, where individuals experience negligible costs for food provisioning, would also facilitate coordination and problem solving in a behavioural experiment with a costlier altruistic payoff. Nevertheless, by using a mutually beneficial cooperation task with a large number of repeated trials, we were able to control for the role of individual benefits in motivating behaviour, showing that dyads with prosocial animals still performed better than nonprosocial dyads in securing food rewards for both social partners, independently of learning or other nonsocial motivational processes.
Despite the observed benefits of prosocial motivation for securing food rewards in the loose string task, we found that most of our subjects were not clearly prosocial (see Appendix, Fig. A3), with only 5/23 individuals evidencing a strong preference for provisioning their group members. Interestingly, none of these prosocial subjects were socially dominant within their groups, either being nulliparous adult females or subadult juveniles. While previous research has found higher prosociality among male compared to female helpers (Burkart, 2015), sex differences in marmoset provisioning behaviour have also been somewhat inconsistent across studies (Finkenwirth & Burkart, 2018  Estimates are shown for the average subject as well as the five prosocial subjects who exhibited moderate to high rates of provisioning behaviour (see Fig. A3 for plots of all subjects and Table A3 for estimated posterior probabilities and between-condition random slope effects).  apparent sex effects may reflect more fundamental variation in factors such as relationship quality and stability within groups (Finkenwirth & Burkart, 2017). These components of social integration are expected to predict further social tolerance from dominant individuals, who might otherwise expel subordinates from the group, and may thus function to regulate subordinates' motivation for costly helping behaviour. In this regard, our findings are consistent with the pay-to-stay hypothesis of alloparental care, which predicts that nonbreeders offset their costs to breeders through helping (Erb & Porter, 2017), although this interpretation remains speculative in light of our small sample size. While our study suggests generalized benefits of prosociality for coordination and cooperative problem solving, it therefore also supports previous work emphasizing that social roles and relationships are central determinants of marmosets' prosocial motivation and the expression of helping behaviour (Finkenwirth & Burkart, 2018). Whether the observed benefits of prosociality extend to such forms of cooperation among less familiar individuals thus remains an important question for future investigation, as does the potential role of reciprocity in facilitating food provisioning. In addition, it is likely that the uneven distribution of prosociality across our sample partially reflects the heterogeneous structure of the social groups, which included both prototypical family groups and groups containing multiple unrelated adults (see Appendix, Fig. A4). As is typical of research in captivity, our study thus provides desirable experimental control while also being limited by the reduced evolutionary and ecological relevance of the experimental context (Massen, Behrens, Martin, Stocker, & Brosnan, 2019).
Individual variation in social motivations such as prosociality is consistent with the more general observation that cooperative behaviour tends to be facultative and highly context dependent (Bourke, 2014;Gurven & Winking, 2008). Previous work has, for example, demonstrated the importance of factors such as relatedness (Green, Freckelton, & Hatchwell, 2016;Lukas & Clutton-Brock, 2012), local ecology (Shen, Emlen, Koenig, & Rubenstein, 2017) and social group dynamics (Smith et al., 2016) in the occurrence of cooperation across a variety of taxa. Whenever such cooperative behaviours are proximately influenced by prosocial motivation, we expect that individual levels of prosocial motivation may also be highly contingent. Similarly, while helping in the absence of overt signs of need suggests proactive prosociality (Burkart et al., 2009(Burkart et al., , 2014Jaeggi et al., 2010), these motivations may nevertheless be conditional on the provisioner's and the receiver's state (Thornton & McAuliffe, 2015), including the constraints imposed by their social roles. The degree to which human cooperation relies on generalized, proactively prosocial motivations also remains unclear, at least within the context of economic games (Burton-Chellew & West, 2013). Caution should therefore be taken in drawing inferences about levels of species-typical prosociality and cooperation (Decety, Bartal, Uzefovsky, & Knafo-Noam, 2016;Thornton & McAuliffe, 2015), and appropriate consideration should be given to the influence of relevant social and ecological factors on individual motivation and the expression of cooperative behaviour .
It is also important to emphasize that the immediate benefits of a motivational trait on performance in experimental tasks need not translate into long-term fitness benefits observed in the wild   Table A5), the aggregate effect across dyads at the group level was quite large (d ¼ 1.15; see Fig. 3).
(Scott- Phillips, Dickins, & West, 2011;West, Griffin, & Gardner, 2007b). Our study demonstrates that prosociality predicted the capacity of marmosets to achieve mutually beneficial cooperation in a coordination task, suggesting that prosocial motivation may be a target of selection whenever the immediate benefits of such prosocially motivated behaviours translate into ultimate fitness benefits. The application of evolutionary ecological theory is needed, however, to generate sensible predictions about the conditions under which such long-term benefits would in fact be realized (West et al., 2007b). For example, the potential benefits of prosociality for coordination and problem solving may nevertheless be offset by greater long-term costs of defection or exploitation from nonprosocial partners, leading to selection against generalized prosociality. As discussed above, asymmetric payoffs between social partners e such as in the dominantesubordinate relations typical of cooperative breeders (Phillips, 2018) e may also select for the state-dependent expression of prosociality, contingent on social roles. Given that prosociality is most likely to evolve in contexts that generate mutually beneficial or synergistic fitness benefits (Akçay et al., 2009), it would be particularly valuable to investigate the benefits of prosociality in the context of behaviours such as cooperative hunting, which is most likely to occur when hunters have a low probability of successfully capturing large prey alone (Packer & Ruttan, 1988). Alloparenting is also commonly observed in cooperatively hunting species, such as coyotes, Canis latrans, and grey wolves, Canis lupus (Smith, Lacey, & Hayes, 2017), which thus present ideal model systems for testing the conditions under which the hypothesized benefits of prosociality for social coordination and problem solving may also translate into lifetime fitness benefits. Wild marmosets also coordinate their behaviour in larger groups for cooperative territorial defence (Lazaro-Perea, 2001). Resource defence benefits are predicted to select for cooperative breeding in saturated environments (Lin, Chan, Rubenstein, Liu, & Shen, 2019;Shen et al., 2017) and may have enhanced prosociality among early human populations reliant on dense and predictable resources (Marean, 2016). Examining whether differential levels of group competition influence marmoset prosociality is therefore another exciting target for future investigation, as well as further understanding the influence of individual differences in prosociality (Carter, English, & Clutton-Brock, 2014;Schachner, Newton, Thompson, & Goodman-Wilson, 2018) on within-group cooperation and between-group competition more generally (Gavrilets, 2015;Majolo & Mar echal, 2017).
In addition to prosociality, we also observed positive effects of partner choice and individual social tolerance (see Fig. 4b and d) on performance in the loose string paradigm, supporting previous work identifying these factors as key mechanisms for primate cooperation (e.g. Hare et al., 2007;Jaeggi & Gurven, 2013;Melis, 2006;Melis et al., 2006;Molesti & Majolo, 2016;Sabbatini et al., 2012). The observed positive effect of age similarity on cooperation (Fig. 4c), irrespective of the total age of a dyad, may additionally reflect the fact that individuals of similar age tend to have stronger social bonds (e.g. de Waal & Luttrell, 1986;Silk, Alberts, & Altmann, 2006) and may be better able to coordinate their behaviour due to the benefits of phenotypic similarity for cooperation (Gabriel & Black, 2012;Laubu et al., 2016;Massen & Koski, 2014). The lack of support for personality similarity effects in our final model is surprising in this regard, as previous work has found positive associations between relationship quality and personality similarity among humans and nonhuman primates (Morton, Weiss, Buchanan-Smith, & Lee, 2015;Youyou, Stillwell, Schwartz, & Kosinski, 2017). The implications of this finding should be interpreted with caution, however, as previous evidence suggests high within-individual variability and group level consistency in marmoset behaviour (Koski & Burkart, 2015). The importance of personality similarity for marmosets may therefore be more apparent during tasks requiring group cooperation. The small number of social groups in the present study and the dyadic nature of our paradigm, as well as the absence of other pertinent personality trait measures, prevent us from clearly addressing this question.
While we conducted the loose string experiment within group enclosures to enhance ecological validity (Burkart et al., 2014;Cronin, Jacobson, Bonnie, & Hopper, 2017;Massen, Ritter et al., 2015), this limited our capacity to differentiate partner choice per se (see Fig. 4b) from differential access to the experimental paradigm. In addition, although the loose string paradigm necessarily requires temporal coordination in pulling, we were unable to effectively distinguish behavioural indicators of coordination among partners, as various uncontrolled stimuli in the home enclosure also influenced subjects' attention, visual orientation and vocalizations during testing. Despite these limitations, given that rates of success steadily increased for the average dyad across experimental sessions (Figs. 3, 4a), it is clear that subjects improved the coordination of their pulling over time (Asakawa-Haas et al., 2016). In addition, our results suggest that competition for access to the paradigm was unlikely to be an important determinant of task success, as individual social tolerance was a clear factor promoting dyadic cooperation. It is also unlikely that partner choice for cooperation was confounded by task motivation, as we controlled for individual differences in food motivation observed during the group service task. Nevertheless, the use of forced dyad designs, as well as paradigms designed for more than two participants, would be a useful tool for fully eliminating such confounds in future studies, as well for better isolating potentially distinct individual, dyadic and group level factors relevant for marmosets' performance in these tasks.
Several alternative explanations for the observed effect of prosociality on cooperation are also possible, but unlikely. First, it is possible that our prosocial subjects performed better in the loose string task not because prosociality enhanced their cooperation with social partners, but rather because both measures reflect underlying differences in test comprehension among our subjects. We think this explanation is improbable, however, due to the extensive training provided to each social group prior to testing, as well as the large number of trials during testing. Indeed, all participating subjects evidenced sufficient comprehension to manipulate the loose string and group service apparatuses and procure food for themselves. Clear evidence of stimulus enhancement effects was only apparent for a few subjects in the group service paradigm (see Appendix, Fig. A3), with most adult subjects simply being motivated to receive food rather than provision their groupmates. General learning effects were also observed and accounted for across all dyads during the loose string paradigm, following training with the string-pulling mechanic during the social tolerance assessment. It is therefore unlikely that comprehension can account for the observed differences in dyadic cooperation between prosocial and nonprosocial subjects. However, given that the cooperative breeding hypothesis predicts that prosociality per se enhances performance in such tasks, it would be valuable for future research to further disentangle any benefits of generalized cognitive or problem-solving ability for performance from the specific effects of social motivation.
Alternatively, selfish rather than prosocial motivation may explain our results, as the variable reinforcement provided by motivational trials may have increased provisioning behaviour for some subjects during test sessions of the group service paradigm. If these individuals also had higher food motivation than their conspecifics during the loose string task, provisioning behaviour would be expected to predict cooperative success. While plausible, this explanation is poorly supported by our data, as prosocial subjects in the group service paradigm tended to receive fewer rewards than the groupmates they provisioned. All subjects also received multiple food rewards at both the provisioning and receiving platforms prior to our testing sessions, making it unlikely that provisioning in particular can be explained by selfish motivation or reinforcement learning. Given that the prosociality effect on cooperation remains after controlling for general food motivation, selfish motivation is also an unlikely explanation for performance in the loose string paradigm.
In conclusion, our study demonstrates the value of examining individual differences, which often remain a large but relatively unexplained source of variation in experimental tasks (e.g. Barrett, McElreath, & Perry, 2017;Watson et al., 2018), to better understand the proximate determinants of cooperative behaviour. By utilizing repeated experimental measures and multilevel statistical models, we detected heterogeneous individual motivation and behaviour among our subjects, which were otherwise obscured by the aggregate pattern in our sample. In addition, by using experimental controls to more confidently identify subjects who both understood the task and exhibited high prosociality, we were able to nevertheless test our hypothesis in the absence of evidence for prosociality across all subjects. In so doing, we found direct support for the positive, independent effects of prosocial motivation, social tolerance and partner choice on social coordination and problem solving in a mutually beneficial task among group members, consistent with the broader claim that these traits are potential targets of selection for cooperative behaviour in primates. These benefits, which were independent of age, sex, personality, food motivation and learning, may therefore help to explain why prosociality, social tolerance and partner choice are associated with various cooperative behaviours across primates (Burkart et al., 2014;Hare et al., 2007;Jaeggi & Gurven, 2013;Kaigaishi, Nakamichi, & Yamada, 2019;Melis, 2006;Melis et al., 2006;Molesti & Majolo, 2016;Sabbatini et al., 2012), as well as their putative role in the evolution of uniquely human forms of social cognition and cooperation (Barclay, 2016;Burkart et al., 2009;Hrdy, 2009;Tomasello et al., 2005).

Data Availability
The data set and R code supporting this article are available as supplementary material.

Subjects and housing
At the time of observation, the laboratory housed seven social groups composed of two to six individuals including the juvenile and subadult offspring of dominant breeding pairs. Data were only collected for groups containing more than three individuals and for adults and juveniles who had been weened by the start of the experiments. This resulted in a total sample size of 23 individuals (mean ± SD age ¼ 5.52 ± 4.86 years; 13 males, 10 females) in five social groups.

Personality scores
We used the regression method (Tabachnick & Fidell, 2014) to generate individual sociability and arousal scores from median posterior estimates of the Bayesian multiresponse model described in Martin et al.'s (2019) research on marmoset personality, which included 19/23 subjects from the present study. In particular, we calculated factor scores (F) for our subjects by where G À1 is the inverse among-individual random intercept correlation matrix, S is the matrix of factor loadings (i.e. the correlation between the latent factors and each observed behaviour) and X is the individual random intercept matrix for the observed behaviours. See Fig. A1 for a graphical overview of the sociability and arousal behavioural syndromes, including model parameters and specific behavioural indicators. These personality dimensions were derived using the Exploratory Graph Analysis þ Generalized Network Modeling (EGA þ GNM) framework , which was applied to monthly counts and durations of individual behaviours, generated from repeated focal animal sampling across 6-weeks of spring (AprileMay) and summer (MayeJuly) observational periods. These data were collected within 3e7 months of the present study. Given that individual differences were found to exhibit moderate to high temporal consistency across observational periods, we considered personality scores from this previous study to accurately reflect personality in the present study.

Experimental procedures
Group service paradigm. Our experimental procedure consisted of five phases based upon previous work by Horn and colleagues (Horn et al., 2016). Small pieces of blueberry and grape were provided as high-quality food rewards throughout the study period.
Phase 0: Habituation to the apparatus. During an initial habituation phase, the group service apparatus (see Fig. 1A) was installed in each group's enclosure for a 2-week habituation procedure prior to training. Rewards were placed ad libitum on the apparatus to enhance subjects' approach motivation.
Phase I: Initial training and habituation to the procedure. Training began with the see-saw mechanism locked in a downward position, so that food placed on the board automatically slid to the wire mesh. On alternating days, food was provided either in Position 0 (the provisioning platform) or in Position 1 (the receiving platform). For 5 Â group size trials, the marmosets' attention was called ('Monkeys!') and one piece of food was placed on the board. The next trial started after a subject obtained the food or after a maximum of 2 min. If a subject took the food, we then placed the next piece of food on the board as soon as no subject was sitting on the platform where food was provided in this session. If no subject took the food, we called the marmosets' attention again, lifted the same piece of food and placed it back on the board. A session ended after all trials or when none of the subjects stepped on the platform for three consecutive trials. We proceeded to the next phase after each marmoset had taken at least 10 pieces of food in a minimum of five sessions.
Phase II: Food distribution assessment. As in Phase I, the see-saw mechanism was locked in a downward position so that food placed on the board automatically slid to the wire mesh. For 5 Â group size trials per group, we called the marmosets' attention and placed one piece of food in Position 1. The next piece of food was placed after a subject retrieved the food or after a maximum of 2 min. The session ended after all trials were completed or when none of the subjects stepped on the apparatus for three consecutive trials. In the latter case, the session was aborted and redone the following day. Two sessions were conducted for each group.
To measure the evenness of food distribution for each group, we calculated Pielou's J 0 (Heip, Herman, & Soetaert, 1998) using the Shannon diversity index H 0 , where J 0 is given by For the proportion p i of food retrieved by the ith individual (i ¼ 1; 2; …; JÞ over 2 Â (5 Â n) trials per group. H 0 quantifies the uncertainty of predicting the identity of the individual who retrieved the food on a randomly selected trial. H 0 max quantifies the maximum possible state of uncertainty in a group, which is the case where all individuals are equally likely to retrieve the food, i.e. p i ¼ J 2Âð5Âgroup sizeÞ . H 0 max therefore corrects H 0 for differences in group size.
See Table A2 for the J 0 of each group. Overall, there was a strong tendency towards equitable food sharing (J 0 z0:9Þ, consistent with previous findings for common marmosets using the original group service paradigm (Burkart & van Schaik, 2013). Notably, Aurora group had a highly unequitable distribution (J 0 ¼ 0:13), with a single male receiving nearly all the food rewards. This idiosyncratic outcome likely reflects the unusually steep dominance hierarchy within this small social group.
Phase III: Further training. In this phase, the marmosets learned how to use the see-saw mechanism and move food towards the wire mesh by stepping on Position 0 (i.e. the provisioning platform). Food was always placed in Position 0. To facilitate learning, the see-saw mechanism was first loosened only partially and food was placed close to the wire mesh. When each marmoset obtained food from the apparatus at least once, the mechanism was then loosened further. The see-saw mechanism was gradually loosened in three steps and the food was subsequently placed further away from the wire mesh. In the final step, the see-saw mechanism was completely released and the food was placed at the other end of the board. Per trial, we called the marmosets' attention and placed one piece of food on the board. The next trial started after a subject obtained the food or after a maximum of 2 min had passed. If a subject took the food, we then placed the next piece of food on the board as soon as no subject was sitting on the platform. If no subject took the food, we then called the marmosets' attention again, lifted the same piece of food and placed it back on the board. A session ended after all 5 Â group size trials or when none of the subjects stepped on the platform for three consecutive trials. Sessions continued until each marmoset took at least 10 pieces of food in a minimum of five sessions with the see-saw mechanism completely released. Two subjects did not frequently participate in the final training sessions (Smart, Ginevra; see Table A1) but attained at least 10 pieces of food across sessions. We therefore considered these individuals to have met sufficient training criteria for the subsequent testing phase.
Phase IV: Group service training and test. In this phase, the seesaw apparatus' mechanism was completely released. We conducted five test sessions and five empty control sessions on alternating days. During each trial of a test session, we called the marmosets' attention and placed a food reward on Position 1 (the receiving platform). The next trial started after a subject either retrieved the reward through provisioning by a subject stepping on Position 0 or after a maximum of 2 min. Additionally, we implemented motivation trials with food in Position 0 at the beginning of each session and after every fifth regular trial, which ensured that a lack of provisioning did not reflect a lack of food motivation. Each session therefore consisted of 5 Â group size regular trials and 1 þ group size motivation trials. For each trial, we recorded which subjects(s) stepped on the Position 0 platform (i.e. moved the see-saw mechanism), which subject(s) stepped on the Position 1 platform and which subject obtained the food reward. Stepping on the platform was only coded when a subject stopped moving across a platform. Following previous studies (Burkart & van Schaik, 2013;Horn et al., 2016), the first three sessions were considered as training sessions and the final two sessions were used as test sessions for data analysis. Due to experimental error, a single trial was skipped (24/ 25 completed) for Sprichtel group during their fourth group service session.
The empty control sessions were identical to the test sessions but no food was placed on the board. Instead, the experimenter approached the apparatus, called the marmosets' attention and pretended to leave a food reward in Position 1. Control sessions also comprised motivation trials with food in Position 0 and contained the same total number of trials as test sessions.
Phase V: Blocked control. The see-saw apparatus' mechanism was completely released as during the test and empty control sessions, but access at Position 1 was blocked with a transparent plastic barrier, so that the food was still visible but could not be obtained. This allowed us to assess whether stepping on the provisioning platform (Position 0) reflected some stimulus enhancement effect of the food reward moving closer to the enclosure rather than prosocial motivation. During each trial, we recorded which subject(s) stepped on the Position 0 platform. There were five blocked control sessions and five empty control sessions on alternating days. The procedure was otherwise the same as in the test and control sessions of Phase IV, with each trial lasting 2 min. New trials did not begin until there were no subjects on the Position 0 platform. As in Phase IV, the first three sessions were used for training and only the final two sessions were used for comparison across conditions.
A research assistant blind to our hypotheses independently coded whether stepping occurred in each trial for 20% of sessions across experimental conditions. Using a two-way, mixed effects, absolute agreement, single rater intraclass correlation coefficient (McGraw & Wong, 1996), our measurements were found to be highly reliable (ICC (3,1) ¼ 0.99).

Loose string paradigm.
Phase 0: Habituation to the apparatus. Our experimental procedure began with a brief habituation phase in which the loose string apparatus was placed in front of the subjects' enclosures for 15e30 min periods intermittently for 2 weeks. Food rewards were placed on the moving platform ad libitum and slowly moved towards the subjects to habituate them to the mechanism. Small pieces of blueberry, grape and banana were provided as highquality food rewards throughout the study.
Phase I: Social tolerance task and training. In addition to the initial habituation phase, each group received a further 180 trial period to familiarize subjects with the string-pulling task, enhance motivation towards the apparatus and assess individual social tolerance towards group members in the presence of food rewards. This social tolerance assessment was based on previous work using the loose string paradigm in ravens (Massen, Ritter et al., 2015). This phase consisted of 18 sessions of 10 trials each in which two untethered strings tied to food rewards were placed on the surface of the moving platform apparatus adjacent to one another. Subjects who tolerated one another at the platform could independently retrieve the food reward tied to the distal end of the string by pulling their respective string into the enclosure. For each trial, we called the attention of the marmosets ('Monkeys!') and simultaneously placed two independent strings on the apparatus, with one end of each string within arm's length of two standing platforms in each enclosure. A trial ended after both strings had been pulled in or after a maximum of 2 min, and we coded which monkeys retrieved the strings adjacent to one another.
Following previous work on loose string cooperation (Massen, Ritter et al., 2015), we derived a measure of social tolerance for each individual by averaging their mean single string retrieval rate in the presence of each group member across experimental sessions. We found that these retrieval rates were highly consistent among individuals across the 18 training sessions (ICC (3,1) ¼ 0.60), such that the total dyadic social tolerance score also effectively controlled for the influence of prior single string training on each dyad's performance during the testing phase.
Phase II: Testing. A total of 40 testing sessions were subsequently conducted for each group, with 10 trials per session. The marmosets' attention was called to the apparatus at the beginning of each session as two equally sized food rewards were placed on the left-and right-hand side of the apparatus, within view of the respective test platforms. A single string was then tethered through the apparatus, raised up and placed within arm's length of the test platforms as soon as two subjects were present on the platforms (see Fig. 1b). A trial was coded as successful if the two subjects were able to pull the moving apparatus towards their enclosure and retrieve the food rewards; conversely, a session was unsuccessful if the string was untethered and the subjects were unable to retrieve the reward, which could occur either through uncoordinated pulling or through a single subject pulling the entire string through to his or her platform. Each trial ended after either a successful or unsuccessful retrieval of the food reward or a maximum of 1 min. Sessions were aborted if three trials ended without any subjects attempting to use the apparatus. A total of one to four sessions were conducted per day for training and test sessions, contingent on the observed food motivation of the group.
A research assistant blind to our hypotheses independently coded whether successful cooperation occurred for 10% of experimental sessions per group. Our measurements were found to be highly reliable (ICC (3,1) ¼ 0.99).

Statistical models and procedures
We estimated Bayesian generalized linear mixed-effects models (GLMMs) for all analyses using the R package 'brms' (Bürkner, 2017), which interfaces with the Stan statistical programming language (Carpenter et al., 2017). As noted in the main text, we employed a fully Bayesian approach to statistical estimation and inference (McElreath, 2016). Therefore, rather than relying upon null hypothesis tests and arbitrary designations of statistical significance, we used multiple sources of information to summarize and draw inferences from our posterior model estimates (McElreath, 2016;McShane et al., 2019;Wasserstein & Lazar, 2016). The R Code and the data set provided in the supplementary material can be used to replicate all analyses described below.
Group service paradigm. We first compared rates of stepping on the Position 0 provisioning platform across conditions during the test sessions (session 4 and 5) to assess whether the subjects understood the task and therefore stepped to provide food rather than because of a stimulus enhancement effect. To do this, we estimated a binomial GLMM predicting stepping on the Position 0 provisioning platform across subjects. Fixed effects included experimental condition (test, empty, control) and time of day to control for unbalanced sampling across morning and afternoon periods. Test was set as the reference category for the experimental condition. Random effects included subject-specific intercepts and slopes across experimental conditions, social group intercepts, and observation level intercepts to account for overdispersion (Harrison, 2014). We therefore estimated the following model conditional on our observed data for observation i of an individual's stepping rate during an experimental session.
Step on Position 0 i $ Binomialðn i ; p i Þ where n is the group-specific trial number per session, p is the probability of stepping on Position 0, a are intercepts, b are fixed effects coefficients, S is the matrix of random effect standard deviations and R is the correlation matrix. This notation is used throughout the remainder of the Appendix. See Appendix, Fig. A3 and Table 3 below for the subject-specific stepping rates estimated from this model. Subjects estimated with high certainty to have stepped more frequently on the provisioning platform during the test condition compared to the empty and blocked control conditions were considered to have understood the task. Although seven subjects exhibited an understanding of the group service paradigm (see Fig. A3 and Table A3), five of these subjects appeared to exhibit appreciably higher prosocial motivation (see Fig. 2 and Fig. A4). To formally assess this difference in motivation, we further modelled provisioning behaviour for only these seven subjects, comparing provisioning rates during test sessions between the two older subjects with low Position 0 stepping rates and the five subjects with moderate to high rates (coded as a binary variable: high (1) or low (0) rate).
Model 2. The five subjects with appreciably higher prosocial motivation were categorized as 'prosocial' subjects throughout subsequent analyses.
Loose string paradigm. As previously noted, we prioritized the enhanced ecological validity of testing in the home enclosure for each social group, rather than conducting a standardized number of trials for all possible dyads. Dyad-and individual-specific outcomes were therefore contingent upon factors such as partner choice and access to the apparatus. In testing our primary hypothesis for the relationship between prosociality and cooperation, we began by analysing responses across groups before investigating dyadspecific outcomes. In particular, we predicted that a greater proportion of successful trials would involve a prosocial individual ('prosocial' trials (1)) compared to successful trials without a prosocial individual ('nonprosocial' trials (0)), irrespective of subject or dyad identity. Only one group was found to have two prosocial individuals (Sprichtel; see Table A1), and we therefore lumped successful trials containing one or two prosocial subjects together as 'prosocial trials' for comparison across groups. One group (Cleli) lacked any prosocial subjects and all of their successful trials were therefore coded as nonprosocial trials. Given that groups varied in the proportion of potential dyads containing a proactively prosocial individual ('proportion prosocial'), we controlled for the proportion of possible prosocial dyads in our analysis. Note that our low statistical power (5 social groups) prevented any substantive interpretation of group level covariates, including the estimated effect size for the proportion of prosocial dyads (raw success rates across groups are shown in Fig. A5). In addition, we included fixed effects for session number to account for learning effects, as well as time of day to control for any effect of unbalanced sampling across morning and afternoon periods.
Model 3. We subsequently analysed dyad level outcomes to account for nested responses within individuals and dyads, as well as to explore the potential influence of age, sex, personality, partner choice, social tolerance and food motivation on observed responses. We estimated the effect of prosociality on dyadic cooperation by comparing the rates of successful cooperation among dyads with (1) and without (0) one of the prosocial subjects. As described in the main text, we also included fixed effects for sex combination, partner preference and the effects of both dyadic similarity and total trait value for age (years), arousal and sociability. In addition, we included the total effect of social tolerance and food motivation to remove any bias in our prosociality measure, as these factors are expected to increase interest and success in experimental tasks and may therefore cause correlated performance across our tasks. Total values were simply the sum of each partner's scores, while similarity was calculated on a 0e1 scale as 1/(1þjvalue subject 1 e value subject 2j). To capture the influence of partner choice on cooperation, we used dyadic success in the previous experimental session as a predictor of current success (0/10 for the first experimental session, and x/10 for x successful trials in the previous session).

Successful cooperation
Sociability and arousal scores were missing for four subjects who were not observed in our previous personality study . Removal of observations including these subjects would result in a significant and undesirable loss of statistical power. Mean imputation is a common solution to this problem, but this approach can potentially bias model estimates due to the assumption that responses are missing completely at random (Collins, Schafer, & Kam, 2001;McElreath, 2016). To assess the possibility of nonrandom missingness in our data, we used Bayesian imputation to examine missingness as a function of age, sex, and prosociality. No clear effects were observed, suggesting that mean imputation provided a simpler and appropriate solution. Scores of 0 were therefore imputed for the sociability and arousal z scores of these four subjects.
During any particular trial, subjects engaged in the loose string task may have used the left or right test platform, but we were interested in modelling the random effect of subject identity on dyadic outcomes irrespective of their position. Appropriately accounting for such arbitrary dyadic structuring, and thus accurately estimating uncertainty in subject random effects, requires the use of so-called multiple membership modeling (Collins et al., 2001). In addition to our multimembership random subject effect, we also included random effects for dyad, social group, observation and day of observation to account for unbalanced sampling across days. Given a high proportion of observed zeroes, we also estimated the probability z i of not cooperating during a test session to account for zero-inflation in the dyadic responses. We did not include additional effects to predict zero-inflation due to insufficient statistical power, and we therefore made the simplifying assumption that z i was constant across trials. Our final fully parameterized model therefore estimated the following effects conditional on our data for observation i of dyadic cooperation during each loose string session.
Successful dyadic cooperation i $ ZIbinomialðn i ; z i ; p i Þ To avoid overfitting our model given the moderate power of our sample, we used the fully Bayesian WatanabeeAkaike information criterion (WAIC) (Watanabe, 2010) to compare this full model to a reduced model excluding any terms that exhibited highly uncertain effects in Model 4.0. This reduced model had the following structure.

Successful dyadic cooperation
As reported in the main text, our model comparison provided support for the reduced Model 4.1 excluding uncertain effects (see Table A4), suggesting that this more parsimonious structure better represents our data. Results reported in the main text are therefore estimated based upon Model 4.1.
Two juvenile subjects in Cleli group (see Table A1) could not be reliably sexed during loose string testing, as our laboratory does not use invasive tagging procedures and these twins could not yet be distinguished by their facial features or genitalia. To account for uncertainty in the assignment of subject identity for these individuals, we used an additional Bayesian multiple imputation procedure. Relative to the covariates in our model, these subjects only differed in their sex. Given that the sexes were not observed to differ in their frequency of cooperation (mean ± SD: males: 3.16 ± 2.90 trials; females: 3.60 ± 2.89 trials), we therefore assumed an equal probability of observing either the male or female juvenile across trials (p ¼ 0.50) and produced five data sets with different randomly generated trial level sequences of identity assignment. We subsequently fitted Model 4.1 to each of these data sets and pooled their posteriors together to account for the influence of uncertainty in trial level assignment on our model estimates (Gelman et al., 2014a). Values reported in the main text are based on this multiple imputation model. Note that very little variance in parameter estimates was observed across the imputed data sets, suggesting that our results were highly robust to this procedure. Years of age are rounded to the nearest whole number and reported for the start of the study period. Note that the subject abbreviations are used in Supplementary data set. a Current or previous breeders within a social group. b These twin juveniles had been weaned by the time of testing but were not yet named by the laboratory. c These subjects did not participate during any loose string sessions.  Higher values of Pielou's J 0 indicate a more equitable distribution of food across subjects. The calculation of J 0 is described in the Appendix (Group service paradigm, Phase II) above. p i refers to individual-specific probabilities of attaining the food reward across two experimental sessions. See Table A1 for further details on the subjects and social groups. Values are based upon posterior random slopes estimated from Model 1 described above.p Test is the estimated median probability of stepping on the Position 0 platform during test condition experimental sessions. p BlockedÀ is the posterior probability of a subject exhibiting a lower rate of stepping on Position 0 during the blocked condition than in the test condition, while p Empty À is the posterior probability of a subject exhibiting a lower stepping rate in the empty condition. Probability decimals are rounded to the nearest hundredth. See Table A1 for further details on the subjects and social groups. a Subjects exhibiting evidence of understanding the task. b Subjects exhibiting an understanding of the task and moderate to high prosocial motivation.  Bolded values represent parameters selected for the reduced Model 4.1, which was selected to decrease the risk of overfitting and enhance model generalization. Parameters with a posterior probability of pþ or pÀ < 0.90 were removed due to high statistical uncertainty for the direction of the effect. Correlation coefficients are reported for individual covariates used in our analysis of dyadic cooperation. Biserial and tetrachoric correlations are reported for prosociality (0 ¼ not prosocial, 1 ¼ prosocial) and sex (0 ¼ male, 1 ¼ female) because these are binary variables. Condition Figure A3. Subject-specific probabilities of stepping on Position 0 platform across experimental conditions. Posterior median ± SD probabilities are shown. These estimates are derived from Model 1 subject random slopes. Pink lines indicate subjects who exhibited evidence of understanding the group service task (i.e. higher test condition stepping probability). See Fig. A4 below for comparisons between the subjects with relatively low (ERN, OLI) and high (AUR, LNA, MAT, NLA, SMB) test session stepping rates. The latter subjects were classified as prosocial. Stepping rate Figure A4. Proportion of provisioning behaviour among subjects passing the group service task. Subjects are categorized by whether they exhibited a high or low stepping rate at the Position 0 platform during group service test sessions (see Fig. A3 and Table A3). The Y axis describes the proportion of experimental trials during which the subject successfully provisioned another group member with a food reward. Proportions are reported to account for differences in the absolute number of trials across social groups. Subjects with high stepping rates were classified as prosocial. See Table A1 for subject abbreviations.  Table A1 for further details on each social group. . Dyadic cooperation as a function of success in the previous session. Density histograms are displayed for counts of dyadic success in the loose string task, faceted by the number of successful trials a dyad completed during the previous experimental session (0/10 to 9/10). Relative rather than absolute counts are displayed to facilitate more direct comparisons between low and high counts of success, given the much higher occurrence of a low success rate. Success in the first experimental session is not displayed because all dyads had zero previous success. Sessions in which a dyad successfully completed 10/10 loose string trials are also not displayed due to this only occurring in two sessions.