Introduction

Being beautiful is beneficial. This is the quintessence of many years of research, as physically attractive individuals are expected to be more intelligent, benevolent and competent (Eagly et al., 1991; Feingold, 1992; Langlois et al., 2000; Shinada & Yamagishi, 2014a) and are thus more popular than unattractive individuals (Boyatzis et al., 1998). Considering the wide body of evidence for a high interrater agreement on who is attractive and who is not (Jefferson, 2004; Langlois et al., 2000; Little, 2014), people that are generally considered as being attractive face a great number of advantages in everyday life. They are thought to be more cooperative (Andreoni & Petrie, 2008), trustworthy (Wilson & Eckel, 2006) and are expected to “possess more socially desirable personality traits” (Dion et al., 1972, p. 286).

But beauty does not only bias social first encounters and relationships, it also pays off in the labor market. Hamermesh and Biddle (1993) found an income gap of 5% between the highly and moderately attractive workers, an advantage for handsome laborers they called “beauty premium”. Additionally, evidence for a “plainness penalty” was found, indicating 7 to 10% less earnings for unattractive individuals compared with average-looking ones. Thus, there is a beauty-gap of roughly 15% between the least and most attractive individuals. By now, the terms “beauty premium” often refers to both monetary and interpersonal advantages a person enjoys solely based on his or her attractiveness, while the “plainness penalty” refers to the disadvantages a person faces due to his or her unattractiveness. The existence of a beauty premium was also supported by similar results that Solnick and Schweitzer (1999) found in an Ultimatum Game (UG). Here, handsome people were provided with 8 to 12% more money than unattractive subjects, even though they did neither demand for more nor did they offer more money than less attractive subjects. Interestingly, more money was demanded of attractive than of moderately or unattractive players. A variety of experiments supported the existence of a beauty premium in the context of economic games, with both rising cooperation rates and generosity with increasing attractiveness of the counterpart (e.g., Ma & Hu, 2015; Mulford et al., 1998; Zaatari & Trivers, 2006; Zhao et al., 2015). For a review of the benefits of being attractive on the interaction partner’s decision in both the labor market and a variety of economic games, see Maestripieri et al. (2017).

However, while studies have shown advantages of physical attractiveness, they have also uncovered disadvantages that come with the fortune of being beautiful. Wilson and Eckel (2006) found in their investigation on behavior in a Trust Game (TG) a beauty premium and a beauty penalty that inflicted attractive individuals who did not meet the (high) expectations of their counterpart. Participants punished those subjects who disappointed them by not fulfilling their hopes, and the severity of this penalty correlated positively with counterpart’s attractiveness, meaning that attractive counterparts were punished more severe as compared to unattractive ones. Similar effects were found in a study by Andreoni and Petrie (2008) where attractive players were punished harsher than less attractive counterparts when not living up to the counterparts’ expectations in a public goods dilemma. There are, however, also opposite findings as Putz et al. (2016) discovered a beauty priority for attractive free riders compared to less attractive ones in a Third-Party Punishment and Reward game. The reported studies are summarized in Table 1 to ease the overview of the state of research.

Table 1 Summary of studies reporting the effects of attractiveness in economic games

Another exception to the general assumption that beautiful people are better off was identified in two studies conducted by Agthe et al., (2010, 2011). In their studies, participants were asked to make decisions in the context of organizational judgments. It turned out that the pro-attractiveness bias held only for attractive individuals of the opposite sex, whereas the effect was not found for same-sex individuals. Moreover, in one of the studies (Agthe et al., 2010), highly attractive counterparts were even discriminated by moderately attractive but not by highly attractive participants. The authors suggest that for a moderately attractive individual a highly attractive same-sex counterpart is seen as a social threat and will thus be disadvantaged. Li and Zhou (2014) found supporting evidence for a beauty penalty which is mediated by the opposite’s gender in a third-party dictator game as well. Here, attractive proposers of the same sex as the participant were punished more severely than unattractive ones when they were disappointing the players’ sense of fairness. Attractiveness leading to disadvantages is reflected in real life. Another early study by Sigall and Ostrove (1975), attractive swindlers were punished harsher than unattractive ones. Also an early study of Dermer and Thiel (1975) found a linear function between a woman’s attractiveness and socially undesirable attributions made by other women, such as vanity, egoism, and being snobbish.

Despite the large body of research already conducted, some points require further clarification. Firstly, research is needed to examine whether the positive effect of attractiveness on social decisions is overarching, or if different people are differently affected by the counterpart’s attractiveness. Secondly, we aim to clarify whether and how the sex (of both, the participant and the counterpart) moderates the relationship of attractiveness on partnership. Lastly, research is needed to find out whether and under which circumstances a beauty penalty exists.

Addressing the first question, this study delves into personality factors, namely agreeableness, on the participant’s side and examines its influence on social behavior. As agreeableness was found to be a significant predictor of cooperation in a Public Goods Dilemma (Volk et al., 2011) and in a Prisoner’s Dilemma (PD; Kagel & McGee, 2014), we focus on the personality construct agreeableness and its facets. Agreeableness is a Big 5 personality trait which refers to the tendency to behave altruistically, cooperatively and trustingly (e.g., Rothmann & Coetzer, 2003). It is associated with the motivation to preserve relationships (Jensen‐Campbell & Graziano, 2001), and as a consequence, the striving for positive interaction (Meier et al., 2010). We aim to replicate the finding of agreeableness predicting cooperation in the context of other economic games. Further, we want to examine whether one facet of agreeableness is particularly predictive for partnership. In a multiparadigm study conducted by Ruch et al. (2017) including several economic games, honesty and humility was found to predict decisions in favor of the interaction partner. Even though agreeableness had no incremental impact when including humility and honesty into the predictions, the findings support our hypothesis insofar as agreeableness correlated to a notable degree with honesty and humility. The “what is beautiful is good” heuristic (Dion et al., 1972) suggests a more positive interaction with attractive compared with less attractive counterparts. That is what agreeable individuals are striving for; attractive counterparts are expected to benefit more from their beauty when facing an agreeable decision maker rather than less agreeable ones. For a more differentiated view, in our study agreeableness will be assessed with its facets of trust, cooperation, morality, modesty, altruism and sympathy.

Furthermore, we aim to clarify the influence that both the opponent’s and the participant’s sex have on the decision to cooperate, trust, and behave altruistically. Hitherto, a particularly large body of research exists examining sex differences concerning participants, though the results are ambiguous. Eckel and Grossman (2001), for example, found women to be more generous than men in an UG, and Ortmann and Tichy (1999) discovered women to be initially more often cooperative in a PD when compared to men. Other studies, however, challenged the generalization of the assumption that women behave more often in ways of partnership by showing men to be more helpful (Eagly et al., 1991) and trusting (Buchan et al., 2008) compared with women. In their summarizing meta-analysis, Balliet et al. (2011) found women and men to cooperate to roughly the same degree, averaged over a large number of studies and years of research. However, different circumstances, such as the partner’s sex, the type of dilemma, and the year of publication, led to differences in cooperation rates from women compared to men. It thus appears to be relevant for the decisions to cooperate whether participants are facing a same sex or an opposite sex opponent.

Concerning the partner’s sex in general, previous researchers again found partly conflicting results. In UG studies, men were preferred as receivers, as they gained significantly more money than women (Solnick, 2001; Solnick & Schweitzer, 1999), whereas offers coming from women rather than men were more likely to be accepted (Eckel & Grossman, 2001). However, studies of Dufwenberg and Muren (2006) and Saad and Gill (2001) questioned these results with women receiving more money in a Dictator Game (DG) compared with men; and also in a PD, women were shown to receive more cooperative responses than men (Ferguson & Schmitt, 1988). In a Trust Game (TG) conducted by Buchan et al. (2008), participants no sex differences.

When including the partner’s attractiveness into the equation, sex differences become more consistent. The beauty benefit seems to be larger for females than for males. This effect was found in the labor market (e.g., Busetta et al., 2013; French, 2002) as well as in different economic games (Kahn et al., 1971). In their extensive review, Maestripieri et al. (2017) report a great body of evidence supporting the idea of a greater attractiveness-related bias for female counterparts than for male counterparts. Precisely, nine studies reported greater or exclusive advantages for females, whereas only one study found advantages for males. Four studies found no gender differences. Furthermore, prosocial and financial biases in favor of attractive individuals were found more often in opposite-sex interactions than in same-sex interactions. The studies of Agthe et al., (2010, 2011) indicated an attractiveness bias that exclusively exists when the counterpart is of the opposite sex. Maestripieri et al. (2017) listed seven studies in total that found greater or exclusive biases in opposite-sex interactions. Farrelly et al. (2007), for example, found attractiveness influences cooperation only with opposite sex partners, but not with same sex partners. However, their hypothesis that attractiveness should bias males’ decisions more strongly than females’ could not fully be supported. In contrast, none of the included studies found evidence for stronger biases in same-sex constellations. Only in a study conducted by Rosenblat (2008) there were no differences between same-sex and opposite-sex biases. The preference for attractive interaction partners of the opposite sex is often explained from an evolutionary perspective, as attractiveness serves as a signal of health and fertility (Maestripieri et al., 2017).

With our study, we hope to shed further light on the role of attractiveness of same sex and opposite sex cooperation partners with males and females, respectively, in charge of the decision to trust, cooperate and behave altruistically.

To operationalize the willingness to behave in ways that can be seen as an expression of partnership, a variety of economic games are widely established. For measuring cooperation, the Ultimatum Game (UG; Güth et al., 1982) and an economic version of the Prisoner’s Dilemma (PD; e.g. Tullock, 1985) are often used. Trust, which may be defined in the context of economic games as the “willingness to bet that another person will reciprocate a risky move” (Camerer, 2003, p. 85) can be measured with the Trust Game (TG), devised by Berg et al. (1995). Lastly, the Dictator Game (DG; Forsythe et al., 1994) examines the individual’s willingness to behave altruistically.

Having these four different operationalizations to address partnership, differently agreeable participants will face varying attractive male and female counterparts. For each counterpart, deciding whether or not they want to behave in a trusting, cooperative or altruistic way. Thus, the influence of the counterpart’s attractiveness, the participant’s agreeableness and both individuals’ sex can be examined. Taking all the outlined research and questions into account we formulated four hypotheses.

Firstly, we expect participants to be more trusting (TG), more cooperative (PD), and more generous (DG) when facing a highly attractive counterpart than when facing an unattractive or moderately attractive counterpart. Secondly, when the offer is unfair (1 € in the UG), attractive proposers are, in line with the beauty penalty, thought to be punished more severely than less attractive counterparts, which is reflected in higher rejection rates. In fair-offer situation (3 € and 5 € in the UG) participants are expected to accept more offers from attractive counterparts in comparison with fair offers from unattractive counterparts. Thirdly, participants, who score higher on the Big Five personality factor agreeableness are supposed to increase their decisions in favor of more attractive partners compared to moderately or less attractive partners. Additionally, we will use a data-driven approach to test exploratorily whether a model that includes one of the facets of agreeableness fits the data better than our confirmatory analyses that only uses the agreeableness factor. Lasty, female partners are expected to benefit more from their attractiveness than male partners. Furthermore, we expect an interaction between the participant’s and the partner’s sex with a greater attractiveness bias for opposite sex than for same sex partners.

Participants and Procedure

Disclosure of Sample, Conditions, Measures, and Exclusions

We hereby confirm that we have reported all measures, conditions, data exclusions, and how we determined the sample size.

Sample

We calculated the sample size a priori using G*Power (version 3.1.9.7, Faul et al., 2009). Investigating the relation between personality traits and behavior, we expected to analyze correlations with small to medium effects. Assuming an effect of r = 0.2, the calculation using α = 0.05 and power (1-β) = 0.80 resulted in a required sample size of at least N = 193. Finally, 210 participants with complete datasets were recruited using the paid research participation system “SONA”, postings on the internet, and the clickworker platform (www.clickworker.com). The participants were between 18 and 58 years old (Mage = 21.5, SDage = 3.85), 52% were female, 185 stated to be heterosexual, 14 homosexual, and 10 bisexual. Participants received a monetary compensation or course credit for participation. Participants were recruited in two waves. The second wave was initiated in order to achieve an equal distribution of gender.

Stimulus Material

The data set of counterparts consists of white faces with a neutral expression from the Chicago Face Database (CFD; Ma et al., 2015). The CFD includes 90 pictures of white women and 93 pictures of white men, which are already rated regarding their attractiveness on a 1 to 7 Likert Scale by an independent rater sample. To form categories, we chose the three most and least attractive faces of each gender respectively, and additionally the three faces around the 50%-percentile. Thus, we selected 18 faces of different persons altogether. The mean attractiveness of the attractive individuals was 4.8, SD = 0.46 (M = 5.15, SD = 0.3 for female faces and M = 4.46, SD = 0.29 for male faces). The moderately attractive counterparts were rated with a mean of 3.16, SD = 0.25 (M = 3.39, SD = 0.01 for female faces and M = 2.94, SD = 0.04 for male faces). The least attractive counterparts were rated with a mean of 1.74, SD = 0.09 (M = 1.68, SD = 0.06 for female faces and M = 1.81, SD = 0.08 for male faces).

Instruments

To examine the participant’s big five personality trait agreeableness, we included the relevant subscales from the German translation of the IPIP-NEO (International Personality Item Pool Representation of the NEO-PIR), the IPIP-240 (Schreiber & Iller, 2017).Footnote 1 The subscales for the personality factor agreeableness (Cronbach’s α = 0.91) are trust (Cronbach’s α = 0.81), altruism (Cronbach’s α = 0.79), modesty (Cronbach’s α = 0.73), cooperation (Cronbach’s α = 0.67), morality (Cronbach’s α = 0.80), and sympathy (Cronbach’s α = 0.72), from which each is measured with eight items on a 5-point Likert scale.

Task and Procedure

The study was made available to the participants online on SoSciSurvey (Leiner, 2019). After answering sociodemographic questions, the participants were confronted with the abovementioned questionnaires in random order. Afterwards, the games were introduced in a randomized order. Each game consisted of a short introduction in which we explained the subsequent game.

In the DG, participants in the role of the dictators had the opportunity to pass as much as they wanted of an endowment of 10 € to the receiver. In this paradigm, receivers cannot decide whether they want to accept the money or not, it is simply split according to the dictators’ will. By contrast, in the TG, trustors receive an endowment from the experimenter and can then decide how much (if anything) they want to pass to the receiver. The entrusted amount of money will be tripled by the experimenter and passed to the trustees, who can now decide how much (if anything) of the tripled amount they want to return to the trustor. In the present study, participants only acted as the trustor and had the choice of how much of an endowment of 10 € they wanted to pass to differently attractive trustees. The initially entrusted amount of money indicates the willingness to trust an unknown person (Eckel & Wilson, 2004). The UG comprises a bargaining situation in which proposers receive a fixed amount of money. They then can decide how much (if anything) of it they want to pass to the receiver, who, on the other hand, can decide whether they accept the offer or not. If they choose to accept, the money is divided according to the proposer’s distribution. If the receiver decides to reject the offer, however, none of the players receive any money. In the present study, participants were only assigned to the role of the receiver. Finally, we included an economic version of the PD to measure cooperation in a situation where cooperating is, on an individual level, not a rational choice. To simplify the paradigm, the game was played with fixed rates of money which the players virtually received depending on their decisions. For each individual, defection results in better outcomes than cooperation. However, for the common profit, cooperation leads to better outcomes. For the present study, we considered only the initial decision to cooperate or not.

DG, TG, and PD comprised 18 trials (i.e., males and females with 3 attractiveness levels and 3 faces per level) which were presented in random order within each game (for reliabilities, see Table 2). In the UG, in contrast, each counterpart was shown three times, offering 1 €, 3 €, and 5 €, respectively. Thus, the study included 54 trials in the UG, and 54 trials in the other three games collectively, making 108 trials altogether.

Table 2 Reliabilities (Cronbach’s α) of the different faces used in this study

To make sure that our participants perceived the attractiveness in the same manner as we expected them to, we asked them to rate each counterpart regarding their attractiveness on a Likert-scale from 1 to 7.

Statistical Analyses

All models were analyzed using RStudio (R Core Team, 2020) and the lme4 package (Bates et al., 2015).

As a manipulation check, we analyzed the attractiveness of the stimulus material using a mixed model with the factors participant sex (two levels: female and male), partner attractiveness (three levels: high, moderate, and low) and partner sex (two levels: female and male) and a random slope for participant.

For the DG and the TG, we calculated a linear mixed model with the money selected by the participants as continuous dependent variable. The UG and PD were analyzed by a logistic mixed model with the binary outcome acceptance versus rejection or cooperation versus non-cooperation as dependent variables. In all four models, the factors participant sex (baseline category: male) partner attractiveness (baseline category: moderately attractive) and partner sex (baseline category: male) were used as predictors and participant as random intercept. Additionally, in the UG the offer size was entered as factor (baseline category: 3 €). Post-hoc tests we adjusted using the Bonferroni correction. For confirmatory analyses, agreeableness was centered around its mean and used as an additional predictor. For the exploratory analyses, we first tested whether the agreeableness factor and its facets were differentially correlated with the overall outcome of each game (i.e., averaged across all trials). To do this, we compared the highest correlation between the personality traits and each game outcome with the second highest correlation. Moreover, for each game, we computed an intercept only model, a model with only the categorical variables as predictors (i.e., participant sex, partner attractiveness, and partner sex), one model for each facet interacting with all categorical variables, and one model with the Big Five factor agreeableness interacting with all categorical variables. Due to several high correlations between the facets (see Table 3), we did not include multiple facets simultaneously. In the exploratory results section, we report the models with the lowest corrected Akaike information criterion (AICc) for each game, as depicted in Table 5.

Table 3 Correlations and descriptive statistics. A total of N = 210 (109 female) participants were included

Results

The reliabilities of the behavior with respect to the different attractive faces, as well as descriptive statistics and correlations, are presented in Tables 2 and 3.

Manipulation Check

The model regarding the stimulus material (\({R}_{m}^{2}= .62)\) showed a significant difference regarding its attractiveness (\({\chi }^{2} \left(2\right)\) = 432.52, p < 0.001). The highly attractive faces (M = 5.09, SD = 0.91) were rated as more attractive compared to the moderately attractive (M = 3.61 SD = 0.83; p < 0.001, β = 1.48, s.e. = 0.05) and unattractive faces (M = 2.16, SD = 0.78; p < 0.001, β = 2.93, s.e. = 0.05). Moreover, moderately attractive faces were rated as more attractive compared to unattractive faces (p < 0.001, β = 1.44, s.e. = 0.05). The model also revealed that women were perceived as more attractive compared to men (\({\chi }^{2}\) (1) = 10.96, p < 0.001, β = 0.35, s.e. = 0.11). Furthermore, there was a highly significant interaction effect of partner attractiveness and partner sex (\({\chi }^{2}\) (2) = 46.55, p < 0.001). Post hoc tests indicated that attractive and moderately attractive partners were rated as more attractive when they were female as compared to male (p ≤ 0.001, β ≥ 0.28, s.e. = 0.07). In contrast, unattractive partners were rated as less attractive when they were female as compared to male (p = 0.002, β = -0.23, s.e. = 0.07). In addition, the two-way interaction between partner attractiveness and participant sex (\({\chi }^{2}\) (2) = 40.43, p < 0.001) showed that female as compared to male participants rated attractive partners as more attractive (p = 0.001, β = 0.38, s.e. = 0.11). Finally, the three-way interaction between partner attractiveness, participant sex and partner sex (\({\chi }^{2}\) (2) = 22.26, p < 0.001) showed that both males and females rated moderately attractive females as more attractive as compared to moderately attractive males (p ≤ 0.001, β ≥ 0.35, s.e. ≤ 0.11). Moreover, males rated unattractive females as less attractive as compared to unattractive males (p < 0.001, β = -0.42, s.e. = 0.11). By contrast, males rated attractive female partners as more attractive as compared to attractive male partners (p = 0.001, β = 0.55, s.e. = 0.11). Female participants did not differ in their ratings between unattractive females as compared to unattractive males as well as attractive females as compared to attractive males (p ≥ 0.767).

Decision Making: Confirmatory Results

The main effects and interactions of all four games are summarized in Table 4 (\({\chi }^{2}\) and p-values).Footnote 2 In the following, we report the coefficients of the significant comparisons, with p-values ≤ 0.05 considered significant.

Table 4 Main effects and interactions of the confirmatory mixed effects models analyses. The predictors that were included in all four games are displayed with all interaction terms (i.e., participant sex, partner attractiveness, partner sex, and the selected participant personality trait). For the ultimatum game, we only added significant main effects and interactions with the offer

Dictator Game

As illustrated in Fig. 1, in the DG (\({R}_{m}^{2}= .11)\), attractive partners received more money compared to moderately attractive (p < 0.001, β = 0.38, s.e. = 0.06) and unattractive partners (p < 0.001, β = 0.59, s.e. = 06), and moderately attractive partners received more money compared to unattractive partners (p < 0.001, β = 0.21, s.e. = 0.06). This main effect was qualified by a two-way interaction between partner attractiveness and partner sex, which showed that both attractive and unattractive females received more money as compared to their male equivalents (p ≤ 0.020, β ≥ 0.19, s.e. = 0.08). Regarding personality, we found that individuals with high trait agreeableness (see Fig. 2) offered more money to their interaction partners (p < 0.001, β = 2.11, s.e. = 0.40). We found a three-way-interaction between participant sex, partner attractiveness, and partner sex. Post-hoc comparisons revealed that for male participants the difference in offered money between attractive and unattractive partners was greater for female partners compared to male partners (p = 0.030, β = 0.42, s.e. = 0.16). In contrast, for female participants the difference between attractive and unattractive partners was greater for males as compared to females (p = 0.040, β = 0.40, s.e. = 0.16). Finally, the significant three-way interaction between partner attractiveness, partner sex, and agreeableness showed that with increasing levels of agreeableness, moderately attractive male partners were offered increasingly more money as compared to attractive male partners (p = 0.043, β = 0.46, s.e. = 0.19). In addition, with increasing trait agreeableness, we found a decrease in the difference between attractive female partners as compared to moderately attractive (p = 0.014, β = -0.53, s.e. = 0.19) and compared to unattractive female partners (p = 0.003, β = -0.62, s.e. = 0.19).

Fig. 1
figure 1

Behavioral responses in the four decision-making tasks. The outcomes in the respective games are grouped by the attractiveness (high, moderate, and low) and the sex (male and female) of the interaction partner. Error bars indicate the standard error of the mean

Fig. 2
figure 2

Main effects (confirmatory analyses) of trait agreeableness (mean centred) in the dictator game, trust game, ultimatum game, and prisoner’s dilemma. The shaded areas represent the 95% Confidence Interval

Trust Game

In the TG (\({R}_{m}^{2}= .08)\), like in the DG, attractive partners received more money compared to moderately attractive (p < 0.001, β = 0.55, s.e. = 0.07) and unattractive partners (p < 0.001, β = 1.04, s.e. = 0.07), and unattractive partners received less money compared to moderately attractive partners (p < 0.001, β = -0.48, s.e. = 0.07; see Fig. 1). Female partners were entrusted with more money than male partners (p = 0.006, β = 0.16, s.e., = 0.06) and male participants entrusted generally more than females (p = 0.014, β = 0.83, s.e., = 0.34). Regarding personality, we found that individuals with high trait agreeableness (see Fig. 2) entrusted higher amounts of money (p < 0.001, β = 1.69, s.e. = 0.53). The two-way interaction between partner sex and partner attractiveness showed that moderately attractive females were entrusted more money as compared to their male equivalents (p < 0.001, β = 0.43, s.e. = 0.10).

Ultimatum Game

For the UG (\({R}_{m}^{2}= .32)\), the main effect of offer yielded significance, indicating higher acceptance rates with increasing offer size: 5 € offers were accepted more often compared to 3 € (p < 0.001, β = 3.75, s.e. = 0.15) and 1 € (p < 0.001, β = 6.75, s.e. = 0.18), and 3 € offers were accepted more often compared to 1 € (p < 0.001, β = 3.00, s.e. = 0.11). Moreover, the main effect of agreeableness showed a positive effect on acceptance rates (p = 0.039, β = 1.87, s.e. = 0.41). The two-way interaction between partner attractiveness and partner sex showed no significant post-hoc comparisons. The two-way interaction between partner attractiveness and participant gender revealed that the difference between attractive and moderately attractive partners (p = 0.035, β = 0.66, s.e. = 0.26) as well as between attractive and unattractive partners (p = 0.019, β = 0.65, s.e. = 0.24) was greater for male participants than female participants. The two-way interaction between participant gender and trait agreeableness showed that with increasing levels of agreeableness males showed an increase in accepted offers, whereas females showed a decrease in accepted offers with increasing agreeableness (p = 0.021, β = 3.03, s.e. = 1.32). Offer was further qualified by a three-way interaction with participant sex and trait agreeableness. Simple slopes analysis showed that for the 5 € (p < 0.001, β = 5.58, s.e. = 1.42) and the 3€ offer (p = 0.029, β = 2.91, s.e. = 1.33), male participants with higher levels of trait agreeableness accepted increasingly more as compared to female participants who showed a decrease in acceptance rates with increasing levels of trait agreeableness.

Prisoner’s Dilemma

In the PD (\({R}_{m}^{2}= .14)\), participants were more likely to cooperate with attractive partners compared to moderately attractive (p < 0.001, β = 1.07, s.e. = 0.13) and unattractive partners (p < 0.001, β = 2.06, s.e. = 0.13). In addition, participants cooperated more with moderately attractive as compared to unattractive partners (p < 0.001, β = 0.99, s.e. = 0.12). The two-way interaction between participant sex and partner sex indicated that female participants cooperated significantly less with male as compared to female partners (p = 0.017, β = -0.34, s.e. = 0.14). The two-way interaction between partner attractiveness and partner sex showed that participants cooperated more often with female moderately attractive partners as compared to male moderately attractive partners (p < 0.001, β = 0.64, s.e. = 0.17). In contrast, participants also cooperated more often with unattractive male partners as compared to unattractive female partners (p = 0.025, β = 0.37, s.e. = 0.17). These two-way interactions were all qualified by a three-way interaction between participant sex, partner attractiveness and partner sex, which revealed that for male participants, the difference in cooperation rates between attractive and unattractive partners was greater for female as compared to male participants (p = 0.021, β = 1.03, s.e. = 0.38). Similarly, for female participants, the difference in cooperation rates between attractive and moderately attractive partners was greater for male partners as compared to female partners (p = 0.001, β = 1.40, s.e. = 0.37). For male and female participants, the difference in cooperation rates between moderately attractive and unattractive partners was greater for female partners as compared to male partners (p values ≤ 0.017, β values ≥ 0.94, s.e. = 0.33). Regarding personality, we found a main effect of trait agreeableness (Fig. 2), indicating higher cooperation rates with increasing levels of agreeableness (p = 0.013, β = 1.52, s.e. = 0.61).

Decision Making: Exploratory Results

According to the model selection based on AICc (see Table 5), for the DG the model including agreeableness was also the model with the lowest AICc and is therefore not included again in this section.

Table 5 Corrected Akaike information criteria (AICc) of different models for the four paradigms. We tested the models with random intercept only (“intercept”), the models with the categorical predictors only (“factors”; i.e., participant sex, partner attractiveness, partner sex, and offer in the ultimatum game), and the models with factors and the agreeableness facets or the personality factor agreeableness. The models with the lowest AICc (printed in bold) were selected for the results section

Trust Game

Following the model selection, we analysed the model with trait sympathy as the predictor (\({R}_{m}^{2}= .08)\). In addition to the reported effects in the confirmatory model with trait agreeableness, we were able to show here that individuals with increasing trait sympathy (see Fig. 3) entrusted increasing amounts of money (p < 0.001, β = 1.23, s.e. = 0.37). The two-way interaction between partner sex and partner attractiveness showed that both attractive and moderately attractive females were entrusted more money as compared to their male equivalents (p ≤ 0.003, β ≥ 0.29, s.e. = 0.10). Follow up comparisons for the three-way interaction between participant sex, partner attractiveness and trait sympathy did not yield significance.

Fig. 3
figure 3

Main effects (exploratory analyses) of the personality traits (mean centred) in the trust game, ultimatum game, and prisoner’s dilemma. The shaded areas represent the 95% Confidence Interval

Ultimatum Game

For the UG, the model including trait cooperation revealed the lowest AICc (\({R}_{m}^{2}= .32)\). In contast to the reported effects in the confirmatory model with trait agreeableness, we found a three-way interaction between partner attractiveness, partner sex and participant sex. Only for male participants, the difference in acceptance rates between attractive and unattractive partners (p = 0.001, β = 0.98, s.e. = 0.26) as well as moderately attractive and unattractive partners (p = 0.002, β = 0.83, s.e. = 0.25) was greater for female partners as compared to male partners.

Prisoner’s Dilemma

In the PD, we selected the model with trait trust as predictor (\({R}_{m}^{2}= .14)\). In addition to the reported effects in the confirmatory model with trait agreeableness, we found a significant two-way interaction between participant sex and partner attractiveness, which showed no significant post-hoc comparisons. The main effect of trait trust (Fig. 3), indicating higher cooperation rates with increasing levels of trust (p = 0.05, β = 0.87, s.e. = 0.44).

Discussion

We investigated how attractiveness and the sex of a social interaction partner affects decision making in four different social and economic paradigms depending on the participants’ sex. To evaluate different aspects of a social interaction, we have chosen the Dictator Game, Trust Game, Ultimatum Game, and Prisoner's Dilemma. Moreover, we examined how the Big Five personality factor agreeableness interacts with decision-making and which particular facet of agreeableness is predictive in the different paradigms.

As expected, participants perceived the attractiveness of their counterparts in line with the intended attractiveness category. Moreover, men rated the range of opposite-sex counterparts’ attractiveness as broader than the range of same-sex partners’ attractiveness. Men thus rated unattractive females as less attractive than unattractive males, whereas attractive females were rated as more attractive than attractive males. Women made no such sex distinctions in the category of attractive and unattractive partners. Hence, men seem to be more judgmental than women towards the partner’s attractiveness when facing a different-sex partner compared to same-sex partners. This is consistent with previous findings (Levy et al., 2008), where men (contrary to women) rated beautiful women as more attractive than beautiful men, which also correlated with enhanced motivational effort for viewing attractive females.

In line with our first hypothesis, we were able to show that in the TG, DG and PD, there was both a clear beauty premium and a plainness penalty, as attractive individuals received more money and higher cooperation rates, whereas unattractive individuals received less money and lower cooperation rates in comparison with moderately attractive individuals. Even in the UG, both a beauty premium and a plainness penalty could be observed for female proposers, when the receiver was male. These findings strengthen the concept of a beauty premium and a plainness penalty, which were firstly described by Hamermesh and Biddle (1993) and further supported by a large body of evidence (e. g., Ma & Hu, 2015; Solnick & Schweitzer, 1999; Wilson & Eckel, 2006). It may be hypothesized that participants show more beneficial economic decisions towards more attractive individuals of both sexes in order to promote positive social relations with them due to their expected qualities (Andreoni & Petrie, 2008; Boyatzis et al., 1998; Dion et al., 1972; Eagly et al., 1991; Feingold, 1992; Langlois et al., 2000; Shinada & Yamagishi, 2014b; Wilson & Eckel, 2006). Accordingly, participants are supposedly less interested in positive interactions and exhibit a lower monetary investment if the social counterpart is of low attractiveness. However, unfair offers from attractive individuals were not rejected more often than unfair offers from less attractive individuals, thus no evidence for a beauty penalty was found. This contradicts our second hypothesis, which was based on the previous findings of Eckel & Wilson (2004) and Andreoni and Petrie (2008) who found attractive individuals who disappointed the participants expectations to be punished harder in a TG and a public goods dilemma, respectively. One has to take into account, though, that in the UG, attractiveness in general seemed to be far less relevant than the size of offer when it comes to decision making. We found no main effect of attractiveness (and only minor advantages for attractive women compared to moderately attractive ones and moderately attractive women compared to unattractive women, when the participant was male) which could explain the absence of a beauty penalty as well. Having identified attractiveness as an important impact factor on social and economic decisions, further research should focus on means to overcome this beauty gap. Moreover, as the beauty gap appears to be greater for women compared to men, unattractive women face a twofold discrimination. Spending so much (well invested) time and energy on discussions of how to overcome the gender gap, society needs to discuss how to deal with this kind of discrimination in everyday and work life. In a recent study placing participants in a hiring position, Tu et al. (2021) found a means to level the gap in an economic context. By asking unattractive individuals to take a powerful body posture, they were rated as being more nonverbally present and the initially found disadvantage in hireability diminished. However, this is not an overarching resolution and may not pay off in social encounters.

Delving into the influence of personality, we could show participants scoring higher on agreeableness as an overall trait tended to be more altruistic, trusting and cooperative. This is consistent with a variety of previous studies who found agreeableness positively linked to cooperation and generosity (e.g., Kagel & McGee, 2014; Koole et al., 2001; Volk et al., 2011; Zhao & Smillie, 2015). Surprisingly and contrary to our third hypothesis, agreeableness did not lead to an increase in decisions in favor of attractive individuals, but even downsized the payment gap between the most and least attractive partners in the DG. As agreeableness was found to play an important role in the inhibition of affect and emotion control (Ode et al., 2008; Robinson, 2007) and the suppression of hostile thoughts (Meier et al., 2006), more agreeable participants may inhibit the urge to favor or discriminate counterparts exclusively based on their (un)attractiveness. However, high levels of agreeableness did not affect the payment and cooperation gap in three out of the four games, but solely led to higher rates of cooperation and payment for all counterparts, regardless of their attractiveness. The abovementioned explanatory approach is thus not completely satisfactory and further research is required. Interestingly, increasing agreeableness scores in women led to decreasing acceptance rates of high and medium offers in the UG. This was not the case for men, who were more likely to accept high offers when scoring high in agreeableness. Further research is needed to determine whether this interaction follows a systematic mechanism or appeared incidentally in our paradigm.

As hypothesized, women benefited more from their attractiveness than men most of the time, contributing to a large body of evidence (Busetta et al., 2013; French, 2002; Kahn et al., 1971; Maestripieri et al., 2017). However, as women were also perceived as more attractive, the origin of this pro-femaleness bias may rather lay in their attractiveness than in their sex. In addition to the pro-femaleness bias, we found evidence for the predicted opposite-sex bias in ratings. The opposite-sex bias was especially large for male participants who preferred attractive female counterparts over attractive male counterparts. This sex difference has already been described in similar studies (e.g., Bhogal et al., 2016) and has also been explained from an evolutionary perspective. While men prefer female mates that show high reproductive value, and thus attractiveness, women emphasize males that present themselves as cooperative and altruistic (see Buss, 1989, for a more detailed discussion). It thus makes sense that males behave in ways that signal resource acquisition, e.g., altruism, generosity, and cooperation when facing highly attractive females. However, in our economic games an opposite-sex bias that depends on attractiveness was only found in DG and PD as evidenced by the three-way interaction of attractiveness, partner sex and participant sex. In these cases, both men and women showed relatively more beneficial economic decisions towards more attractive opposite-sex counterparts. This may be linked to mechanisms of mating behavior in both gender groups and could have an evolutionary background with attractiveness signaling health and fertility for the opposite sex (Maestripieri et al., 2017).

In a recent review, Kou et al. (2020) discuss the underlying cognitive mechanisms influencing the processing of facial attractiveness. They also argue that evolutionary processes may play an important role in both the opposite-sex bias and the femaleness bias when processing differently attractive faces. However, they could not fully discover whether the “female beauty captures attention” or the “opposite-sex beauty captures attention” hypothesis is more likely.

Interestingly, while unattractive women received more money than unattractive men in the DG and in the TG, they faced disadvantages in the PD, where participants cooperated less often with them than with their male counterparts. The reason for these dissimilarities may lay in the differences in the paradigms. Only in the PD, participants rely on their partner’s willingness to cooperate. As they have no other cue than their counterpart’s physical appearance when deciding whether or not to cooperate, the detrimental biases of unattractive counterparts seems to be stronger when facing women than men. In the other paradigms, participants were more generous towards unattractive women than men.

In our explorative analyses we examined which specific facets are especially predictive for decision making in the different paradigms. Concerning the UG and the PD, the facet trust predicted cooperation and acceptance rates to the highest degree. In the TG, increasing scores on the facet of sympathy led to higher amounts of entrusted money. This is counterintuitive as in both the TG and the subscale of trust are supposed to measure trust as a construct and should thus highly correlate. Respectively, in the PD and the UG, one would intuitively expect cooperation to have a stronger influence on decision making than trust. This begs the question whether the subscales of agreeableness do measure distinguishable facets or if the intercorrelation is too high to actually differ between the constructs. It does also underline the importance to include several paradigms and subscales to explore the mechanisms underlying the interaction of attractiveness, the facets of agreeableness and the decision to behave in cooperative, trusting, and altruistic ways.

Taking all the results presented above into account, we found strong support for both a beauty premium and a plainness penalty whereas a beauty penalty could not be observed. Evidence was also found for a stronger pro attractiveness bias for women compared to men, which is in line with a variety of studies. Interestingly, increasing agreeableness did not lead to stronger benefits for attractive counterparts, but rather reduced the beauty gap. Furthermore, in differing economic games, different facets of agreeableness seemed especially predictive. Including multiple games and multiple facets of agreeableness in our study led to a more differentiated and sounder outcome than we would have found with only one specific paradigm.

As a limitation we want to point out the attractiveness differences concerning our counterparts. Both women and men rated moderately attractive females as more attractive than moderately attractive males. This could bias the effects in favor for women and lead to a diminished generalizability. While the reliabilities of the faces were particularly high in the TG and DG (all values of α ≥ 0.85), and acceptable in the UG (most values of α > 0.7), in the PD, however, most reliabilities fell below the critical value of 0.7, as participants differed more severely in their decisions whether or not to cooperate with the different attractive counterparts. Thus, the results for the PD should be taken with caution due to their limited consistency.

To simplify our paradigm, we only included pictures of white, young to middle-aged faces. Future studies should include other races and ages (in both counterparts and participants) to increase the generalizability, as social proximity was found to influence social decisions (Balliet et al., 2014). Moreover, further research is necessary to examine whether the effects are transferable into face-to-face situations.

Conclusion

Across a variety of economic decision-making situations, we were able to show a more generous attitude towards attractive people. The pro attractiveness bias has been shown to be stronger for women, especially when men are the favoring subjects. While agreeableness led to higher rates of cooperation, generosity and altruism, more research is needed to examine the mechanism behind this effect, as the results were heterogenous across gender and different economic games.