Judging attractiveness: Biases due to raters’ own attractiveness and intelligence

Abstract Tennis and Dabbs (1975) reported that physically attractive males showed a positivity bias when rating the attractiveness of others. The opposite pattern was observed for females. We attempted to replicate and extend these findings by: (1) using self-assessed attractiveness rather than the experimentally derived attractiveness measure used in previous research, (2) using face-to-face interactions with targets as opposed to using photographs, and (3) examining the effect of another ego-involving attribute: intelligence. Consistent with previous research, attractiveness judgments made by men, but not women, correlated positively with their own self-perceived level of attractiveness (r = .51, p < .001). Attractiveness judgments made by women, but not men, correlated negatively with their intelligence (r = −.32, p = .001). Judgments of attractiveness are thus biased by a rater’s own attributes (e.g. attractiveness and intelligence), but these effects are not generalizable across men and women raters, and may be driven by different mechanisms.


ABOUT THE AUTHORS
Our lab's research interests involve the identification of the interactional behaviors, interpersonal competencies, and personality traits related to emotional intelligence and interpersonal sensitivity. This report comes from a large longitudinal project that is now in its eighth year where groups of participants were intensively observed and assessed over a period of 10 weeks. Measures included personality tests, aptitude tests, communication skill tests, and interpersonal activities. The general aim of the project was not to test one or two precise causal relationships. Instead, we provide preliminary examination, and possibly support, for many research hypotheses that warrant the investment of resources required to perform more rigorous tests. This investigation presents an example of such a research hypothesis. The particular application here involves the assessment of physical attractiveness. The broader, more important issue concerns the validity of relying on human judgment and self-reports as psychological assessment instruments.

PUBLIC INTEREST STATEMENT
Although every aspect of industry, education, and government utilizes the human judgment process in the assessment of others, very little is known about the specific biases that undermine its validity. Human resource managers routinely judge the overall competency and organizational fit of every job applicant. Supervisors are often required to assess the professional appropriateness of their employees' behavior and appearance. But these assessments are subjective, vaguely defined, and often subject to disagreement. More importantly, we know them to be vulnerable to cognitive and motivational biases, many of which are still unidentified and unknown to both the lay person and the social scientists. One of these biases is the attractiveness halo, where the positivity of nearly every attribute is correlated with the target's physical attractiveness. To the extent, our assessment of others is impacted by their attractiveness; it is important to determine the factors influencing how we judge the attractiveness of others.

Introduction
Are people evaluated by the content of their character or by the beauty of their skin? If it is the latter, then what impacts our judgment of one's beauty? It has long been established that humans unwittingly evaluate others using a judgment policy that might be described as, what is beautiful is good (e.g. Dion, Berscheid, & Walster, 1972;Feingold, 1992). An attractiveness halo exists that often impacts our evaluations of people's competency (Clifford & Walster, 1973;Dion et al., 1972;Jackson, Hunter, & Hodge, 1995), personality and integrity (Berscheid & Walster, 1974;Stewart, 1980), and even their emotional well-being (Diener, Wolsic, & Fujita, 1995). Although the strength, robustness, and generalizability of this judgment bias has not been fully worked out (e.g. Eagly, Ashmore, Makhijani, & Longo, 1991;Jackson et al., 1995), there is no doubt that whenever a person needs to assess or evaluate another human being, their judgments will be impacted by physical attractiveness regardless of its relevance to the attribute being judged. To the extent this is true, understanding the processes by which people judge the attractiveness of others is critically relevant to understanding how people evaluate others on everything else.
Researchers consistently have demonstrated that attractiveness judgments are affected by context (e.g. Cash, Cash, & Butters, 1983;Kenrick & Gutierres, 1980;Melamed & Moss, 1975;Weaver, Masland, & Zillmann, 1984). Participants who were exposed to photographs and video clips of attractive models and television stars rated subsequent targets and themselves as less attractive than participants who were exposed to unattractive others. Collectively, these studies provide strong support for the subjectivity of attractiveness judgments, suggesting that it is not a precisely defined objective label that can be reliably applied to people. Although researchers have meticulously studied how attractiveness judgments and related outcome variables are influenced by recent perceptions of others' attractiveness (e.g. Cash et al., 1983;Kenrick & Gutierres, 1980;Melamed & Moss, 1975;Weaver et al., 1984), few have attempted to investigate the individual differences that might exist in the attractiveness perception process (Tennis & Dabbs, 1975).
Generally, it is well known that much of the variance in perception, perhaps as much as 20%, is actually due to differences in perceivers (Kenny, 1994). 1 As a result, researchers studying the judgment of other human attributes (e.g. personality, intelligence, etc.) have made substantial efforts to examine individual differences in their perception and assessment (e.g. Davis & Kraus, 1997;Lippa & Dietz, 2000). Thus, the objective of the current study was to identify some potential individual difference effects (i.e. the rater's own physical attractiveness and intelligence) in the judgment of another's beauty.
One major bias in person perception that could impact attractiveness ratings is the assumed similarity effect (Cronbach, 1955). This is where a rater's assessment of others centers on their assessment of themselves (i.e. anchoring). If they are high on an attribute, they will tend to rate others highly on the same attribute. Likewise, if they are low on a given attribute, they are likely to rate others low on it as well. Tennis and Dabbs (1975) sought to determine whether this effect impacted judgments of attractiveness. Specifically, they examined whether judgments of attractiveness correlated positively with the raters' own level of attractiveness. Results showed that, in fact, men's ratings of targets' attractiveness did correlate positively with their own level of attractiveness. However, the opposite was true of women. Those who were more attractive gave more negative ratings of their targets. Tennis and Dabbs (1975) speculated that less attractive females may be reluctant to assign "very unattractive" ratings to others if they, themselves, valued this attribute because of the implications it would have on them as an unattractive female. Attractive females, on the other hand, might be harsher judges because doing so would enhance their relative attractiveness advantage over others.
Presumably, these effects would be a consequence of raters' awareness of how attractive they are. Otherwise, the mechanism explaining such effects would be difficult to imagine. However, Tennis and Dabbs (1975) operationalized rater attractiveness in terms of research assistants' judgments instead of self-report so we do not know whether the attractive individuals in that study actually felt attractive. We assume they did, but this was never established. It is possible that the above effects reported by Tennis and Dabbs (1975) would be stronger, if they had asked raters to self-report levels of their own attractiveness. Thus, the current investigation operationalized rater attractiveness in terms of self-perceived attractiveness. The idea here was to capture how attractive our perceivers feel when they look at themselves in a mirror.

Present research
The objective of this research was to replicate and extend the work of Tennis and Dabbs (1975) in three ways. First, we opted to use self-ratings of attractiveness. If raters use their own level of attractiveness as an anchor or standard against which to judge another's attractiveness, then their actual attractiveness is not nearly as important as is their self-perceptions. In other words, the reference point is not how attractive the raters are, but how attractive they perceive themselves to be.
Second, we chose to employ a face-to-face interactional context rather than rely on passive ratings of photographs. According to Tickle-Degnen and Rosenthal (2007), making judgments in these two modes is experientially very different. Photographs do not encode the rich repertoire of expressive behavior and charisma that influence and enhance interpersonal attraction. Face-to-face judgments of attractiveness are likely to encompass other factors such as personality (e.g. warm, nice people are more attractive than cold, and mean people) and value congruence (e.g. like attracts like). As a result, studies that assess attractiveness from photographs may underestimate its true impact. Brown and Bernieri (2014) tested this hypothesis directly by investigating the link between narcissism and physical attractiveness employing both methodologies (face-to-face ratings versus ratings of video clips). Hypothesized relationships between narcissism and attractiveness were found to be stronger in magnitude when physical attractiveness was judged face-to-face than when judged from video recordings. The authors proposed that attractiveness is a complex and nuanced construct involving expressive behavior that is not captured when one attempts to assess it from static, posed photographs (Alicke, Smith, & Klotz, 1986;Brown & Bernieri, 2014;Riggio, Widaman, Tucker, & Salinas, 1991;Ritts, Patterson, & Tubbs, 1992).
Finally, we examined whether the observed rating effects would generalize to other attributes of the rater (i.e. intelligence), which might suggest an alternative mechanism involving ego-maintenance. Social comparison theory (Festinger, 1954) posits that our evaluations of others have impact on our own self-esteem. Comparing ourselves to others worse off than ourselves boosts our self-esteem, whereas comparing ourselves to others better than us is ego-threatening (Aspinwall & Taylor, 1993). In comparison, Tennis and Dabbs' (1975) proposed anchoring mechanism does not have a motivational component. As a purely information-driven phenomenon, individual differences in the perceiver's intelligence should not impact attractiveness ratings at all. However, if a perceiver's intelligence (i.e. a positive attribute) has a similar impact as attractiveness on their ratings of others' attractiveness, then this would imply the involvement of a motivational mechanism. In other words, to the extent that intelligence serves as a buffer for one's identity and self-esteem, intelligent individuals may feel less threatened by others' positive attributes and rate others higher on physical attractiveness.
In summary, based on Tennis and Dabbs' (1975) findings, we predicted that men's ratings of others' attractiveness would correlate with ratings of their own attractiveness. However, we did not expect to see this pattern in women. Although we had no confident predictions for rater intelligence, we felt that if it showed an assumed similarity pattern then it would support a more general self-esteem maintenance mechanism for the attractiveness judgment process.

Participants
Participants were 161 undergraduate students enrolled in a 10 week long "Psychological Assessment" research practicum. Participants were treated in accordance with the "Ethical Principles of Psychologists and Code of Conduct" (American Psychological Association, 2002). The learning objectives of the practicum involved gaining first-hand experience in the assessment process and to become more acquainted with issues of validity. Throughout the practicum, participants would receive feedback on many measures they had completed and would hear formal lectures describing the psychometric properties of the instruments used. Although attendance was required in this Pass/Fail practicum, it was made clear that their behavior, performance, and responses to all items would not in any way impact their academic standing.

Procedure
On the first day, unacquainted participants were randomly assigned into groups of five to seven students in which they provided zero-acquaintance ratings of each other (e.g. Albright, Kenny, & Malloy, 1988;Kenny, Horner, Kashy, & Chu, 1992;Passini & Norman, 1966;Paunonen, 1989;Watson, 1989). They were instructed to rate group members' personality using the 10-Item Personality Inventory (Gosling, Rentfrow, & Swann, 2003) while refraining from any communication. These data were not used in the current study and will not be discussed further. Participants were asked to avoid all contact with their group members so that they would remain unacquainted until the next experimental session. At the next session, 48 hours later, participants interacted one-on-one with each other in round-robin fashion (see Figure 1). After each conversation, participants confidentially provided us with attractiveness ratings of themselves and their partners. Weeks later, all participants completed three tests of intelligence.

Measures
Participants were asked to provide ratings of targets' natural beauty that we defined as the following: Natural beauty is described as what is left after sexiness and hotness have been removed. A rating of 72 means this person is high in natural beauty, 1 indicates this person is low in natural beauty, 36 is average. Participants also rated the vocal attractiveness and the sexy/hotness of targets on this 72-point scale. The correlation between ratings of natural beauty and sexy/hot was strongly positive (r = .80, p < .001), suggesting that participants did not discriminate much between these two constructs.

Figure 1. Five-minute interaction between dyads.
Whereas using a congruent measure of "sexual attractiveness" might seem more appropriate at first, we opted against this because doing so would exclude many participants who were not (1) heterosexual, (2) within the same age range, or (3) of the opposite sex as their interacting partner. Thus, we selected the measure of "natural beauty" for this study in order to utilize the widest, most inclusive sample possible.
For self-assessments of attractiveness, however, we wanted to reduce the impact that idiosyncratic factors might have had on that particular day, such as the amount of sleep they had that night, their mood, clothing, or make-up choice. We wanted a more stable, robust, and reliable assessment of their baseline level of attractiveness. In addition, we wanted to capture the self-assessment process the takes place whenever one looks in the mirror and attempts to evaluate how attractive they feel. Within our sample of university students, we reasoned that a substantial proportion of their satisfaction (or distress) over how good they look probably involves an estimation of their desirability as a dating partner. Therefore, we asked people to assess themselves by taking the perspective of another. Specifically, we asked, On the last day of class, how sexually attracted will person A be to you?
Participants rated this item separately with respect to each member of their group, up to six times. This measure was rated on a 1-8 scale.
Three objective tests were combined to serve as our intelligence criterion. The first was Raven's Progressive Matrices (Raven, Raven, & Court, 2000). This is a well-established nonverbal test of intelligence that correlates positively with the Wechsler and Stanford-Binet Scales (r = .50 to .70; Strauss, Sherman, & Spreen, 2006). Our second test was the Otis Quick-Scoring Mental Ability Test (Gamma Test; Otis, 1954; Appendix A), which contains a broad array of verbal, graphical, and quantitative questions. It has a corrected split-half reliability coefficient of .88 (Otis, 1954) and has a high level of concurrent validity with the Terman Group Test of Mental Ability (Terman, 1920;r = .75; as cited in Grant, 1961), which is a very slightly modified version of the Army Alpha intelligence test (Jones & Thissen, 2007, p. 6). Although the Otis Quick-Scoring Mental Ability Test (Otis, 1954) has not been used in recently published studies, the Army Alpha intelligence test has been employed relatively recently in Emotional Intelligence research (e.g. Mayer, Caruso, & Salovey, 1999). Our third measure was a vocabulary test (Appendix B) since vocabulary is considered by some to be the best measure of intelligence (e.g. Rhodes et al., 1995). All three intelligence tests correlated positively and significantly with each other (r = .24 to .49), so we created a single global composite measure of intelligence by averaging the three intelligence test scores after first converting them to Z-scores.

Results
Descriptive statistics are presented in Table 1. Each rater judged four to six others on their attractiveness. Since we were interested in individual differences between raters across all targets, we used the average judgment made by each rater for these analyses. Overall, men rated the attractiveness of others lower (M = 41.57) than did women (M = 46.27; t(157) = −3.00, p = .0031). Table 1 also presents self-ratings. Overall, men rated themselves higher in self-perceived attractiveness (M = 3.35) than did women (M = 2.56; t(157) = 3.48, p = .0007). It is notable that whereas only 2 males gave themselves the lowest self-perceived attractiveness rating on the scale, 14 females rated themselves that low (see Figures 2 and 3). These findings are consistent with past studies that report that men tend to overestimate their attractiveness (e.g. Abbey, 1982;Haselton & Buss, 2000;Saad & Gill, 2009).
We hypothesized that participants' attractiveness would be positively correlated with their judgments of others' attractiveness for men, but not for women. Although we had no clear prediction for intelligence, we reasoned that if intelligence followed the same pattern as rater attractiveness, then the results would suggest the involvement of social comparison processes. Consistent with Tennis and Dabbs' (1975) findings, we observed a positive correlation between men's self-perceived attractiveness and their ratings of others' attractiveness (r = .51 p < .0001). Consistent with the assumed similarity effect (Cronbach, 1955), men's ratings of others' attractiveness were anchored on their perceptions of their own attractiveness. No such effect was observed within women (r = .09, p = .36).
Within male perceivers, intelligence showed no influence on how they rated the attractiveness of others (r = .00, p = 1.00) (see Figures 4 and 5). One interpretation might be that men do not consider intelligence to be a valued attribute. Alternatively, it could be that social comparison processes regarding intelligence are less threatening to self-esteem maintenance.  Unexpectedly, there was a negative relationship between women's intelligence and their ratings of others' attractiveness (r = −.32, p = .001). Women who were more intelligent were harsher critics of natural beauty, whereas those who were less intelligent rated others as more attractive. Although we do not have an explanation for this result, it is interesting to speculate why this may be so. Perhaps, attractiveness is more confounded with intelligence within women than within men. To the extent expertise and effort contributed to the attractiveness of one's appearance (e.g. through the study and application of principles presented in fashion magazines), then it seems reasonable for intelligent experts to grade others more harshly. If, however, attractiveness is believed to be an inherent property of a target rather than a skillful construction of an attribute, then the intelligence of the grader will have no impact whatsoever on their ratings. Of course, further research is needed to explore this proposed gender difference that posits women to attribute more effort and skill to attractiveness than men.

Discussion
These results both replicate and extend the work of Tennis and Dabbs (1975) in several ways. First, we replicated their finding that attractiveness judgments provided by men were subject to an assumed similarity bias, such that males who considered themselves attractive exhibited a positivity bias in their ratings of others. If self-appraisal is in fact the key-anchoring mechanism, then we would expect a greater positivity bias to be exhibited by those with extremely positive views of the self (e.g. narcissists), and perhaps a negative bias from those who suffer from low self-esteem or body image insecurity. It is difficult to compare directly the magnitude of our effect with that published by Tennis and Dabbs (1975) due to their use of data imagery rather than correlations to make their argument. Nevertheless, we feel our effect size of r = .51 is comparable to, or even stronger than the subtle effect they reported. Therefore, we feel  face-to-face interaction processes likely amplify person perception effects, suggesting that studies employing photographs or video clips may be underestimating the true magnitude of person perception effects.
Finally, with respect to intelligence, it clearly did not impact attractiveness judgments in the same way that self-appraisals of attractiveness did. Intelligence and attractiveness are both attributes that can impact one's self esteem. If assumed similarity was driven by self-esteem maintenance (Tesser, 1988), such that attractive males were more comfortable rating others highly because they themselves were secure and not threatened, then intelligence should have had an effect similar to the one observed with attractiveness self-appraisals. This was not observed.
A limitation of this study was that the experiment was not designed specifically to answer the research questions examined here. Instead, the data were collected as part of a larger, longitudinal project. A consequence of this was that we were forced to use objective measures of intelligence rather than self-appraisals of intelligence. If our reasoning is correct, then a person's self-appraisal of intelligence-and any other ego-involving attribute-would be critical to generating this type of rating bias. Further, intelligence self-appraisals were probably not salient at the time of the attractiveness judgments whereas self-appraisals of attractiveness were. A more powerful test of the intelligent rater bias hypothesis would have been to either inform raters of their intelligence scores, or have them estimate their own intelligence immediately prior to making their ratings.
Despite its limitations regarding intelligence, we did find an unexpected negative bias such that intelligent women gave lower ratings of attractiveness to others. Interestingly, Tennis and Dabbs (1975) argued for a contrast effect within women raters where attractive women rated others more harshly. Although we did not observe this effect with attractiveness, we found that intelligent women rated others more harshly. The similarity in the direction of these effects should not be taken as evidence that they are the same phenomenon. More carefully controlled methods in future studies are needed to ascertain the true nature of this judgment.
In conclusion, researchers have consistently reported that attractive individuals are perceived positively on the basis of their physical appearance alone. Although it is well known that individual differences in person perception are large and robust (e.g. Davis & Kraus, 1997;Lippa & Dietz, 2000), little research has been done to investigate the role of raters' attributes on their judgments of attractiveness. The present study demonstrated that judgments made by men are impacted by the self-appraisals of their own attractiveness, and that judgments made by women are impacted by their intelligence. Although judgment biases were observed in both men and women, it will require further research to disentangle the different mechanisms behind each.
It has long been established that judgments of others' attributes tend to be unconsciously affected by their physical attractiveness, although the factors influencing perceptions of attractiveness are less well known (Tennis & Dabbs, 1975). The present investigation was among the first to demonstrate that rater attributes (i.e. intelligence and attractiveness) impact judgments of another's attractiveness and that these effects are different for men and women. In order to better understand the effect of rater attributes on our perceptions of attractiveness, future research is required to examine (1) the effects of other rater attributes such as specific personality traits or individual preferences, (2) the potential moderators of this relationship (e.g. culture and age), and (3) underlying mechanisms for these findings. A better understanding of rater attributes as a source of subjectivity in the perceptions of attractiveness will inform research and practice alike, providing another clue for the recognition and attenuation of detrimental biases in human judgment.