Professor Gender, Age, and “Hotness” in Influencing College Students’ Generation and Interpretation of Professor Ratings

A sample of 230 undergraduate psychology students rated their expectations of a bogus professor (who was randomly designated a man or woman and “hot” versus “not hot”) based on ratings and comments found on RateMyProfessors.com. Five professor qualities were derived using principal components analysis: dedication, attractiveness, enhancement, fairness, and clarity. Participants rated current actual psychology professors on the same qualities. Current professors were divided based on gender (man or woman), age (under 35 or 35 and older), and attractiveness (at or below the median or above the median). Using a multivariate analysis of covariance (MANCOVA), students expected hot professors to be more attractive but lower in clarity. They rated current professors who were male and 35 or older as lowest in clarity. Current professors scored significantly lower in dedication, enhancement, fairness, and clarity when rated at or below the median on attractiveness. Results, along with previous research, suggest numerous factors (largely out of professors’ control) influencing how students interpret and create professor ratings. Caution is therefore warranted in using online ratings to inform a variety of decisions, including students’ course selection or even administrators’ hiring and promotion decision making.


Introduction
University instructors have long been regularly evaluated by their own students. In recent decades, opportunities for college students to express their opinions of their professors have expanded beyond formal university-administered group evaluations to posts on informal websites like RateMyProfessors.com (RMP; Johnson & Crews, 2013). RMP is a website on which university students may post anonymous evaluations of their professors. Since its inception in 1999, the service has become immensely popular throughout the world; users have created over 17 million ratings for 1.6 million professors in Canada, the United Kingdom, and the United States (RMP, 2016a). Sites similar to RMP operate in other countries; for example, Rate My Teachers for Republic of Ireland, Australia, and New Zealand (i.e., ratemyteachers.com). In addition to writing narrative comments about their professors, students use RMP to rate their professors on helpfulness, instructional clarity, and course easiness using a rating scale of 1 (low) to 5 (high). Scores ranging from 3.5 to 5 are considered "good," scores ranging from 2.5 to 3.4 are considered Professors cite several sources of concern regarding the validity of RMP ratings (e.g., Davison & Price, 2009;Hartman & Hunt, 2013;Sonntag, Bassett, & Snyder, 2009). First, there is no guarantee that ratings have actually been posted by former students of the professor (Johnson & Crews, 2013;Montell, 2006;Otto, Sanford, & Ross, 2008;Timmerman, 2008). For example, the first and second authors both have at least once been rated on RMP for classes they have never taught, possibly a result of students not correctly remembering the names of the actual instructors. While instances such as this may produce laughter and seem fairly harmless, more alarming cases have been recorded involving negative postings made by rivals or disgruntled colleagues instead of students (see Carnevale, 2006). Second, even when postings are crafted by actual students, concerns remain about the validity of such postings as reflections of teaching quality or as windows into what potential students may expect from taking a class with a particular professor (Legg & Wilson, 2012). For example, students self-selecting to participate in RMP posting may harbor deeply felt or extreme views and may not represent a professor's general student body (Boswell, 2015;Legg & Wilson, 2012). Further concerns exist about possible biasing factors shaping how online professor ratings are both interpreted and created.
The purpose of the current investigation was to examine potential sources of bias in both interpreting and posting online professor ratings. It should be noted that there is evidence for a variety of sources of bias in university-administered teaching evaluations (which would presumably involve fewer concerns about the identity of the evaluator than RMP-type postings). For example, university-administered teaching evaluations may be subject to bias from whether the evaluated course is required versus elective (Divoky & Rothermel, 1988;Feldman, 1978;Patrick, 2011;Petchers & Chow, 1988;Scherr & Scherr, 1990), whether the course is higher level versus lower level (Goldberg & Callahan, 1991;Moritsch & Suter, 1988;Patrick, 2011), and whether it is a humanities or social sciences course versus a math or science course (Cashin, 1992;Patrick, 2011). Students' perceptions of the instructor's personality characteristics or interaction style also impact evaluations, which could reflect influence on actual quality of teaching or on more peripheral features like likability (Ahmadi, Helms, & Raiszadeh, 2001;Clayson & Sheffet, 2006;Feldman, 1986;Hart & Driver, 1978;Jenkins & Downs, 2001;Patrick, 2011;Widmeyer & Loy, 1988;R. Wilson, 1998). Moreover, other teaching-irrelevant qualities such as style of dress (Eadie, 1996;Sebastian & Bristow, 2008), formality of name (e.g., title and last name versus first name; Sebastian & Bristow, 2008), and ability to be entertaining (Gotlieb, 2011) affect students' evaluations of professors. These teaching-irrelevant qualities exert such an influence upon professors' evaluations that it is possible for students to predict how a professor will be evaluated simply by watching a muted video clip of the professor; entertaining individuals, even with no demonstrated knowledge of the topic, benefit from bias and receive higher teaching evaluations (Ambady & Rosenthal, 1993).
Given that RMP provides students with a similar opportunity to offer opinions of professors, it is likely that RMP ratings will be affected by the same sources of bias that affect their universityadministered counterparts. Moreover, the lack of quality control (Johnson & Crews, 2013) inherent in ratings on RMP and similar sites likely allows for additional sources of bias to shape the content of ratings, for example, course easiness (Felton, Koper, Mitchell, & Stinson, 2008;Felton, Mitchell, & Stinson, 2004) and professor race (Reid, 2010).
Investigating potential sources of bias or distortion of online professor ratings is particularly important because, while RMP presents itself as an entertainment site, its data is commonly used for numerous purposes well beyond pure entertainment (Landry et al., 2010). Research has demonstrated that many students use such sites before signing up for classes with specific professors (Hossain, 2009), and exposure to online professor ratings may actually influence students' expectations and motivations for their own performance in a course (Edwards, Edwards, Qing, & Wahl, 2007;Edwards, Edwards, Shaver, & Oaks, 2009). Exposure to positive or negative online ratings has even been shown to influence students' teaching evaluations of an actual classroom lecture (Lewandowski, Higgins, & Nardone, 2012). Moreover, RMP evaluations may influence students' in-class behaviors such as notetaking and participation in class discussions and activities (Kowai-Bell, Guadagno, Little, Preiss, & Hensley, 2011). Taken together, these findings suggest that RMP content may exert a significant influence upon students' learning and academic achievement.
In addition to its effects on students, RMP's consequences also extend to institutions and professors. For example, RMP scores and narrative comments at least partially contribute to some college rankings (Howard, 2014;Johnson & Crews, 2013) as well as promotion and hiring decisions (Johnson & Crews, 2013;Montell, 2006;Pannapacker, 2007). Professors, too, appear affected by online rating content in terms of their affect and self-efficacy, and the effects do not differ from those of reading more respected university-administered student evaluations of teaching (Boswell, 2016). Past research regarding targeted possible influences on interpretation and generation of online professor ratings will be described in detail below.

Gender of Professor
Gender expectations may lead students to interpret professor ratings or rate professors differently based on whether they are men or women. Prior research has indicated that college students tend to rate men professors more favorably than women professors (Abel & Meltzer, 2007;Basow & Silberg, 1987;Joye & Wilson, 2015). This bias extends even to the online classroom, where students and professors do not interact in person. For example, assistant instructors, working under both a stereotypically male and stereotypically female pseudonym, were evaluated more favorably when using the male identity (MacNell, Driscoll, & Hunt, 2015). Other research has revealed, however, that a main effect of professor gender is likely best interpreted within the context of additional moderating factors. For example, college students evaluate women professors more harshly when they do not conform to gender-based expectations of helpfulness and flexibility (Bennett, 1982), and women are evaluated more severely than men when they have high grading standards and teach academically rigorous courses (Sinclair & Kunda, 2000). Other research has suggested that student raters interpret professor qualities differently when rating men versus women, attributing lack of clarity to low effort from men professors but low ability in women professors (Stuber, Watson, Carle, & Staggs, 2009). Furthermore, students rating a supposed applicant for a university teaching position appear to expect different qualities from men versus women applicants (Burns-Glover & Veith, 1995). Based on this body of findings, the following hypotheses were included: Hypothesis 1: College students' interpretations of online professor ratings will differ by professor gender (whether the professor is described as a man versus a woman).
Hypothesis 2: College students will rate their own professors differently depending on whether they are a man versus a woman.

Age of Professor
In addition to evidence for gender bias, studies indicate a professor's age also affects students' evaluations of teaching (Bianchini, Lissoni, & Pezzoni, 2013;Feldman, 1983;Kinney & Smith, 1992). For example, Arbuckle and Williams (2003) demonstrated that students would evaluate a lecture more positively if they were led to believe that it was delivered by a young (i.e., under 35) man, supporting the notion that students expect college professors to be or at least resemble younger men (Messner, 2000). Moreover, in an analysis of RMP ratings, Stonebraker and Stone (2015) found that elevated age negatively affects students' ratings of professors' teaching; these effects begin as early as professors' mid-forties. Other recent findings also indicate that students rate older professors most harshly (Joye & Wilson, 2015;J. Wilson, Beyer, & Monteiro, 2014;Zabaleta, 2007), possibly because of the professors' dissimilarity to students who are typically in their late teens or early twenties (Gehrt, Louie, & Osland, 2015). The following hypotheses were generated from this evidence supporting a potential main effect of professor age and, based on the findings of Arbuckle and Williams (2003), a moderating effect of professor gender on this age effect: Hypothesis 3: College students will rate their own professors differently depending on whether they are under 35 versus 35 or older.
Hypothesis 4: Professor age and gender will interact in influencing students' evaluations of their own professors.
While college students may interpret online professor ratings differently depending on whether the professor is described as younger versus older, this hypothesis was not tested in the current study. This was largely because of concerns that online comments on professor rating sites do not often include information regarding professor age.

The Chili Pepper (or "Hotness")
Past studies also have indicated that professors higher in hotness (defined in various ways, but generally encompassing students' subjective appraisal of professors' physical features) are perceived more positively than those lacking hotness (Bonds-Raacke & Raacke, 2007;Buck & Tiene, 1989;Felton et al., 2004;Felton et al., 2008;Freng & Webber, 2009;Hamermesh & Parker, 2005;Liu, Hu, & Furutan, 2013;Riniolo, Johnson, Sherman, & Misso, 2006;Romano & Bordieri, 1989). Teaching at the college level may represent one of the countless situations in which people attribute higher degrees of socially desirable traits to attractive people but not less attractive individuals (Dion, Berscheid, & Walster, 1972), often summarized with the phrase "what is beautiful is good" (Eagly, Ashmore, Makhijani, & Longo, 1991, p. •••). RMP provides information on hotness with the chili pepper feature. Professors with a chili pepper are hot; professors without a chili pepper lack hotness (RMP, 2016c). Research on the effects of the dichotomous chili pepper feature has yielded mixed results; some studies indicate little influence on perceptions of teaching quality (Coladarci & Kornfield, 2007) and even lack of user respect for its meaningfulness (Kindred & Mohammed, 2005), but others demonstrate that professors with a chili pepper receive more favorable ratings (Lawson & Stephenson, 2005). For example, hotness may impact ratings of professors' clarity and helpfulness (Bonds-Raacke & Raacke, 2007). The pool of evidence, while mixed, led the authors to include the following hypotheses: Hypothesis 5: College students will interpret online professor ratings differently depending on whether the professor is noted to be hot or not.
Hypothesis 6: College students will rate current professors differently depending on how physically attractive they find them.
Given the evidence for self-reported females rating their instructors differently than selfreported males (Burns-Glover & Veith, 1995;Kohn & Hatfield, 2006), exhibiting preference for professors of the same gender (Gehrt et al. 2015) and being more influenced by hotness in evaluating professors (Liu et al., 2013), all hypotheses were tested while controlling for participants' reported sex. Because some have proposed perceived similarity as the driving feature in age effects on professor ratings (Gehrt et al., 2015), participant age was also included as a control variable for both hypotheses addressing professor age.

Participants
The convenience sample consisted of 230 college undergraduate students enrolled in introductory psychology and lifespan developmental psychology classes at a public university in the southern United States. Participants were recruited from 18 class sections ranging in size from 40 to 100 students. Professors teaching these sections during the year of study recruitment included six men and six women. Students received course credit for participation; the option to participate in other studies and an alternate assignment were available to students who chose not to participate or did not meet inclusion requirements for this investigation. This study specified that participants must have been 18 years of age or older at the time of participation.
Participants providing their sex (n = 228) included 43 males (18.70%) and 185 females (80.43%). The average age was 19.54 years (SD = 1.62). The sample was predominantly White (70.00%), with 20.43% African American, 3.91% Hispanic or Latino, 2.61% Asian, and 0.87% Native American, Aleut, or Aboriginal peoples. Participants' self-reported grade point averages were somewhat high (M = 3.18; SD = .60), averaging in the range of a letter grade of B. The vast majority of participants (91.74%) had previously visited RMP, and 76.50% reported using the site to make decisions about enrolling in classes at least some of the time. Out of the full sample, 218 participants' (41 males, 177 females) data were complete on all proposed covariates and independent and dependent variables and were included in final statistical analyses. The Institutional Review Board reviewed and approved this study prior to recruitment.

Procedure
Data collection occurred online. Participants accessed the questionnaire using a web link posted by the primary investigator on the psychology department's participant recruitment site. Upon opening the survey, participants were randomly assigned to view one of the different versions of the bogus professor's online rating summary. Before viewing this material and followup questions regarding their expectations of the bogus professor, their ratings of a current actual professor, and demographics, participants initialed an informed consent form. They were required to complete the questionnaires in one session. Instructions stated that participants were allowed to skip any items with which they felt uncomfortable. Participants were assured that all information would be kept confidential and that no responses would be shared with their psychology professors. Data were managed by a graduate research assistant not teaching any psychology classes so as to maintain strict confidentiality.

Measures
Demographics. Participants completed closed-ended items addressing their own sex and race. In addition, they completed open-ended items regarding their own current age and grade point average on a 5-point scale (0 = F, 4 = A).
Previous exposure to RMP. Participants reported whether they had ever visited RMP (a yes/no item) and how often they made decisions about enrolling in classes based on RMP ratings using a rating scale from 1 (never) to 5 (always or almost always).
Perceived demographics of current professor. Participants were asked how old they believed their current psychology professor to be (under 35 years old versus 35 years old or older) using a closed-ended item. They additionally responded as to whether their current psychology professor was a man or a woman. Out of the full sample of responses, 106 participants reported that their professor was a man, while 115 identified their professor as a woman. In addition, 95 participants perceived their professor to be under 35, and 126 perceived their professor to be 35 or older.
Interpretation of online professor ratings. Because the investigators desired to learn more about students' expectations from and creation of professor ratings, professor ratings were assessed in two distinct ways. First, to measure students' expectations of a professor based on reading professor ratings, participants read an online rating of a bogus professor and then rated the professor on a series of teaching and personal qualities (see Table 1). The stimulus professor's rating was presented in the format used on RMP; ratings were in the average range for helpfulness (3.2 out of 5) and in the good range for clarity (4.2 out of 5), easiness (4.0 out of 5), and overall quality (3.8 out of 5). The researchers intended the scores to indicate neutral to good quality. Extreme scores were avoided so that the scores themselves would not overwhelmingly command attention. The description also listed whether the professor had a chili pepper to indicate being hot. There were four different versions of the described professor: a woman with no chili pepper, a woman with a chili pepper, a man with no chili pepper, and a man with a chili pepper. The professor was always listed with the intentionally gender-neutral name "Alex Johnson," but pronouns differed between female and male versions.
After reading the bogus RMP listing, participants were asked how often they would predict the professor would display a series of qualities. Items were scaled from 1 (never or almost never) to 4 (always or nearly always). The items presented (see Table 1) were crafted for the present investigation. The content of the items was generated by the investigators based on frequent content of their own teaching evaluations and online professor ratings. Additionally, students enrolled in independent research were asked to contribute any additional items they deemed appropriate.
To examine potential subcategories of teaching qualities, a principal components analysis (PCA) was conducted using promax rotation because items were anticipated to be correlated. The Kaiser-Meyer-Olkin measure of sampling adequacy was .92, and Bartlett's test of sphericity was statistically significant (p = .00), indicating the data was suitable for PCA (Field, 2013). Factor loadings obtained from the pattern matrix are summarized in Table 1. The authors employed a criterion of a factor loading of .40 or higher for inclusion of an item in a particular subcategory. The total number of factors or subcategories was determined using the Kaiser criterion of an eigenvalue of at least 1.00 (Field, 2013).
As seen in Table 1, the first factor, Dedication, accounting for 36.77% of variance, included items centered on the theme of professors behaving in a professional and respectful manner and conveying general enjoyment of teaching. The second factor, Attractiveness, explaining 12.95% of variance, included items about the professor's physical appearance. The third factor, Enhancement, accounting for 7.69% of variance, included items referring to enrichment of teaching, what some might label "the little extras" in obtaining and maintaining student interest. Next, the fourth factor, Fairness, representing 4.02% of variance, reflected a theme of evenhandedness and consistency in teaching and grading. Finally, the fifth factor, Clarity, explaining 3.42% of variance, centered on making oneself understood by students. Based on these factor loadings, five composite variables were created with the mean scores for included items. The mean was used in place of a sum or other calculation so as to maintain the item scaling of 1 (never or almost never) to 4 (always or nearly always). Descriptive statistics, internal consistency within factors, and correlations among the computed variables are listed in Table 2. As seen in Table 2, internal consistency, as assessed with Cronbach's α, was high (>.70) for all but the clarity variable. This composite variable included the fewest individual items, and scales with fewer items often exhibit lower consistency as calculated with Cronbach's formula (Peterson, 1994). Evaluation of current professor. Participants were also asked to evaluate their current psychology professor. The psychology professor was selected as the target because all participants were enrolled in a psychology class; however, their other coursework would vary. Participants rated the professor using the same items used to assess interpretations of online ratings of a bogus professor (see original stem items listed in Table 1). Specifically, students indicated how often their current professor displayed the characteristic in question using a scale from 1 (never or almost never) to 4 (always or almost always). For consistency, the same composite variables were computed. The investigators did conduct PCA with these items as well to ensure that similar factors emerged. Considerable overlap with the original PCA conducted with the responses for the bogus professor was evident in these results. Descriptive statistics, internal consistency, and bivariate correlations for these variables addressing evaluation of the current psychology professor are presented in Table 3.

Interpretation of Online Professor Ratings
To examine hypothesized differences in participants' expectations of a bogus professor based on the professor's gender and designated hotness, a multivariate analysis of covariance (MANCOVA) was conducted with the five composite dependent variables (dedication, attractiveness, enhancement, fairness, and clarity) and two factor or grouping variables: professor gender (man versus woman) and hotness (chili pepper versus no chili pepper). Participant sex was included as a covariate. MANCOVA was conducted in place of a series of analysis of covariance (ANCOVA) due to significant correlation between most of the five dependent variables (r = .05 to r = .63; p < .01 for 8 out of 10 correlations; see Table 2). The assumption of homogeneity of covariances was examined with Box's test of equality of covariance matrices, and results were not significant (p = .26), suggesting the assumption had not been violated. However, the assumption of normality did appear to be violated for all dependent variables based on significant Kolmogorov-Smirnov and Shapiro-Wilk tests and evident negative skew (with scores situated at higher values) in histograms. Because ANOVA procedures are considered robust to violations of the normality assumption (Field, 2013;Mertler & Vannatta, 2005), analyses proceeded, but results should be interpreted with caution.
The main effect of professor gender was not statistically significant (Wilks'  = .99; F = .45; p = .81); however, the main effect of hotness was statistically significant (Wilks'  = .53; F = 38.16; p = .00; η 2 p = .47, or large effect size). The interaction effect for Gender  Hotness was explored, but it was not significant.
Follow-up tests were conducted using a series of univariate ANCOVA. For the hotness effect, differences at the univariate level were significant for attractiveness (F = 148.17; p = .00; η 2 p = .40, or large effect size), which was not surprising. Less anticipated, there were significantly different expectations of professor clarity based on hotness (F = 9.80; p = .00; η 2 p = .04, or small effect size). Examination of the means (see Figure 1) revealed that participants expected Alex Johnson to be more attractive and lower in clarity when a chili pepper was included in the online professor rating summary presented.

Rating of Current Professor
Examination of the effects of professor gender (man versus woman), perceived age (under 35 versus 35 or older), and student-rated attractiveness (at or below the median versus above the median) on ratings of four of the current psychology professor's apparent qualities (dedication, enhancement, fairness, and clarity) also was conducted with MANCOVA, with both participant sex and current age in years included as covariates. For this analysis, attractiveness was excluded as a dependent variable because it was used as a grouping variable. The assumption of homogeneity of covariances was again examined with Box's test of equality of covariance matrices, but results were significant (p = .00) for this analysis, suggesting the assumption had been violated. Box's test is known to be highly sensitive and unequal cell sizes can increase the likelihood of obtaining significant results. For this analysis, ensuring equal cell sizes was difficult because existing professor qualities were being evaluated. Specifically, there were 104 participants reporting having men as professors compared to 113 reporting women professors, and 94 participants believing their professor to be under 35 compared to 123 perceiving their professor as 35 or older. In addition, the assumption of normality did once again appear to be violated for all dependent variables based on significant Kolmogorov-Smirnov and Shapiro-Wilk tests and evident negative skew (with scores situated at higher values) in histograms. Because ANOVA procedures are considered robust to these violations (Field, 2013;Mertler & Vannatta 2005), analyses proceeded as planned but results should still be interpreted cautiously.
Participants' ratings of their current psychology professor significantly differed by professor gender (Wilks'  = .76; F = 13.17; p = .00; η 2 p = .24, or large effect size) and perceived age (Wilks'  = .82; F = 9.39; p = .00; η 2 p = .19, or large effect size), but the interaction between the two variables also was statistically significant (Wilks'  = .92; F = 3.47; p = .01; η 2 p = .08, or medium effect size); therefore, the main effects will not be interpreted further. The follow-up ANCOVA indicated that the significant difference was on ratings of professor clarity (F = 7.59; p = .01; η 2 p = .04, or small effect size). As illustrated in Figure 2, younger professors were generally rated higher in clarity, and men in the 35 or older category were rated as less clear than women in that age group. There was also a significant main effect for professor attractiveness (Wilks'  = .87; F = 7.87; p = .00; η 2 p = .13, or large effect size). As illustrated in Figure 3 and supported by follow-up ANCOVA, professors considered more attractive (above the median of 2.60) were rated significantly higher on dedication, enhancement, fairness, and clarity.

Discussion
The current investigation addressed several hypotheses regarding possible sources of bias in college students' interpretation and creation of online professor ratings such as those available on RMP. Findings revealed that online professor ratings may be swayed by a variety of personal features of professors, many of them being, for the most part, out of the professors' control. The investigators obtained support for bias stemming from professor gender, age, and hotness, with specific findings not necessarily matching with previous findings.

Gender and Age of Professor
Results of the current investigation supported student bias in favor of women professors when rating an actual professor but not when interpreting or forming expectations from an online professor rating. Professor gender interacted with professor age when evaluating a current psychology professor on clarity. Specifically, male professors believed to be 35 or older received the lowest clarity ratings compared to younger men and all women evaluated. Given that the sample was largely female and young, this finding may stem from dissimilarity between participants and older men as professors; younger female participants may perceive older men as too dissimilar from themselves. This is consistent with previous research that individuals are more dismissive of information that they receive from others they perceive to be different from themselves (Wheeless, 1974).
Additional explanations have been proposed for both gender-and age-related bias against professors. For example, age dissimilarity may produce a communication gap with undergraduates; this gap is especially wide for older faculty members who may have less familiarity with widely used technology and well-known examples from popular culture (Gehrt et al., 2015). Many digital technology-native university students (similar to the current sample) currently expect professors to incorporate PowerPoint slideshows, social media, and occasional YouTube clips into lectures and to post supplemental material online (Borboa, Joseph, & Spake, 2012;Griesemer, 2011;Saeed, Yun, & Sinnappan, 2009); however, older faculty members may have developed their preferred teaching style and format when such features were not available or commonplace. Studies have indicated that older faculty members are more reluctant to implement computer-based teaching methods (Rousseau & Rogers, 1998), and faculty members in general report little to no formal training in implementing enhancements such as web-based instruction (Vodanovich & Piotrowski, 2005). While it is completely possible to deliver a quality lecture using only chalk or a dry-erase marker and a mounted wall board, students nonetheless may prefer notes and supplements accessible through electronic devices and examples linking course material to the video clips, memes, and streaming programs they commonly view on the same devices outside of class.
In addition to dissimilarity, the bias against older men as professors may exist because there are more women teaching on college campuses than in the past. Previous evidence for preference for professors who are men (see Basow & Silberg, 1987) may reflect an earlier time when women professors were more of a novelty. Current university students in the United States are likely exposed to a mixture of men and women as professors, potentially eroding the once dominant stereotype of the university professor as male authority figure. Furthermore, the higher number of females in the sample may have further swayed findings because women have been shown to particularly value female faculty over male faculty (Bachen, McLoughlin, & Garcia, 1999;Basow, 2000). The body of evidence suggests a complex array of influencing factors at work and beyond professors' command, with physical attractiveness further expanding the list of advantageous characteristics professors may be fortunate to possess out of sheer luck or effort unrelated to teaching ability.

Professor Hotness
Support for bias based on professor hotness emerged as well, without all obtained results fitting the expectations. It was hardly shocking that participants anticipated a professor designated as hot to be higher in attractiveness. Such a result implies that participants noticed the chili pepper and believed it to be accurate. More surprising was the finding that hot professors were expected to be lower in clarity, suggesting a negative effect of hotness. That is, hot professors were expected to look good but not teach as well in at least one domain. This result is in direct opposition to that of Bonds-Raacke and Raacke (2007), who reported higher ratings of clarity in professors rated higher in attractiveness.
The present finding is among a growing body suggesting that the "what is beautiful is good" stereotype may be more complex or in need of exceptions than previously believed. Other evidence for this includes Chia, Allred, Grossnickle, and Lee's (1998) finding that, based on photographs of both attractive and unattractive men and women, unattractive men were rated highest in ability, while unattractive women were rated lowest in ability. In addition, Mehng (2015) observed a negative effect of attractiveness on perceived competence in the presence of low warmth. This finding may reflect what some have labeled the "beauty is beastly" effect (see Johnson, Podratz, Dipboye, & Gibbons, 2010). That is, there is a small body of evidence that attractiveness can be detrimental (for women, specifically) in some situations. In particular, physically attractive women are rated more negatively when applying for jobs perceived as more masculine in nature and for which physical appearance is deemed unimportant. The investigators did not find support for an interaction effect between gender and hotness, though, casting doubt on the "beauty is beastly" effect operating as it is typically described. Replication and further expansion of this research is clearly called for to better understand why college students might expect less clarity from professors having a chili pepper.
The results of professor hotness on ratings of current professors are more consistent with prior work. Professors rated as more attractive were rated higher in all other areas. In addition to the possibility that students indeed perceived more attractive professors as more competent, it is possible that well-liked professors were rated favorably in all areas assessed regardless of their actual level of hotness. This is consistent with previous research indicating that RMP attractiveness ratings are influenced by students' positive illusions of professors (Theyson, 2015). Additionally, because the grouping for this analysis resulted from a quasi-experimental instead of true experimental design, the authors cannot say for certain whether attractiveness influenced the other professor ratings in a unidirectional manner. Furthermore, because attractiveness was not rated by objective outside observers, it is certainly possible that less attractive professors with other strong teaching qualities may have become more attractive to students over time. Still, given that many aspects of hotness are out of one's control, and more so with age, results are concerning considering that professors not graced with hotness may be unduly penalized for a feature not intrinsically linked to actual teaching ability. Likewise, higher ratings assigned to professors lucky enough to possess pleasing physical features may place them at an advantage in competing for student enrollment, faculty positions, teaching awards, tenure, and promotion.

Limitations and Future Directions
This investigation possessed several limitations. First, the sample was limited to students in lower level psychology courses. Given that course level (Goldberg & Callahan, 1991;Moritsch & Suter, 1988;Patrick, 2011) and discipline (Cashin, 1992; Patrick, 2011) may bias student evaluations, results may not generalize to students taking upper level classes or those from other disciplines. The unequal representation of male and female students prevented comparisons based on participant sex, though sex was included as a covariate in all analyses. The current study also did not include professor race as a variable because the current psychology professors being evaluated were all White. Previous research suggests racial-minority faculty members are rated most harshly on RMP, and that Black men as professors are rated particularly negatively (Reid, 2010); therefore, race should be included in analyses when possible.
Convenience sampling presents an additional limitation of the current study. Most participants identified as White. This may limit generalizability of results to larger, more ethnically diverse college groups. Future research could extend the current study to college campuses with greater ethnic and racial diversity in their student body. Moreover, the average participant age in the sample was 19 years old, seating this sample in the traditional college student age range (National Center for Educational Statistics, 2016). Findings from this largely traditional-aged sample may not generalize well to nontraditional students. This may be particularly applicable to findings related to technology use in the classroom; nontraditional college students may differ in the importance that they place on classroom technology use compared to digital-native, traditional-aged college students.
Additionally, geography may limit the generalizability of the findings. Participants were recruited from a public university in the southern United States; however, RMP evaluations represent professors across the United States as well as Canada and the United Kingdom. Future investigations may benefit from recruitment of participants from all regions represented on RMP to determine if findings from an American sample generalize to student populations in other countries with RMP. Additionally, it is important to learn if these findings would also generalize to students who use similar services in countries where RMP is currently unavailable. These students include those who use the Rate My Teachers Republic of Ireland, Rate My Teachers Australia, and Rate My Teachers New Zealand websites. Despite differences in rating site used, these findings suggest that student ratings available through unofficial, non-university-affiliated sources (e.g., RMP, Rate My Teachers, social media) have the potential to impact students' perceptions of their professors. This should be taken seriously given that these perceptions, formed before students ever meet professors, can impact students' motivations for a course and course-related behavior (Edwards et al., 2007;Kowai-Bell et al., 2011).
Because the measures used in hypothesis testing were developed specifically for this study, it is optimal for replication to take place, including attempts to more firmly establish reliability and validity of measures. Alternate means of measuring relevant variables should be explored, too. For example, when examining the effect of hotness on evaluations of current professors, it would be desirable to have trained outside coders objectively rate physical attractiveness. Another key feature to assess would be professors' actual teaching style, vocal quality, organization, and evident personality features. Again, this information would best be assessed by objective coders unfamiliar with the professors.

Implications
Taken together, and keeping these limitations in mind, these findings indicate a multifaceted and complicated association between teaching competence and student evaluations. Moreover, they suggest cause for concern regarding the widespread and increasing use of online professor ratings by students and university administration. These seemingly fun ratings have the potential to influence course enrollment, sway expectations of students who do enroll in a given course, and tarnish a professor's reputation. Alarmingly, these negative consequences may not result from poor teaching so much as largely uncontrollable factors such as a professor not being the preferred gender, not looking or acting young enough, or not being as easy on the eyes as peers or competitors. Sites like RMP may rate professors, but findings like these beg us to ask if they truly do rate teaching or have any appropriate place in students' or university administrators' decision making.
Despite these concerns about the validity of RMP ratings, this remains: They can and sometimes do influence decision making. For example, students perceive these evaluations as credible tools to inform their education-related decisions (Field, Bergiel, & Viosca, 2008;Davison & Price, 2009;Hayes & Prus, 2014;Landry et al., 2010). The influence of RMP-style ratings on students' decision making suggests that faculty members should not be hasty and completely reject their content, despite the ratings' biases. Further supporting the case that RMP content merits some faculty attention is evidence that RMP narrative content is pertinent to teaching (Otto et al., 2008) and focused on instruction-related characteristics such as content knowledge, clarity in communication, and organization (Hartman & Hunt, 2013;Kindred & Mohammed, 2005;Silva et al., 2008).
Although RMP ratings are not intended to provide formative feedback to faculty, some may be nonetheless interested in exploring their ratings. We suggest that faculty members seeking to garner formative information from RMP ratings approach them with similar techniques utilized for formal, university-administered student evaluations. For example, Buskist and Hogan (2010) recommend a systematic approach to interpretation of student ratings. First, remove all comments that are irrelevant to course content or teaching (e.g., "Her clothes are ugly"). Next, remove all comments that lack concrete, specific information about teaching or the course (e.g., "She is super" or "I think it's stupid that we have to take these classes to graduate"). Then, group comments into two categories: (a) aspects of teaching and course content that can be changed (e.g., "Assignments would work better in a different sequence") and (b) aspects of teaching and course content that cannot be changed (e.g., "This subject has a lot of technical information"). Aspects of teaching and content that can be changed may also be further categorized as (i) things useful to change and (ii) things not useful to change. For example, comments such as "I hate that we have to do outside reading" reflect pedagogically useful components of a course despite their unpopularity with some students.
In addition to their influence upon student decision making, RMP ratings may also influence administrator decision making regarding hiring, tenure, and promotion. We caution, however, their use in this way. Given that course easiness influences RMP's overall quality ratings (Felton et al., 2008), it may be tempting to "water down" or reduce the rigor of one's course to improve one's overall quality ratings. This concern regarding reduced rigor extends to the results of formal, university-administered student evaluations of teaching as well (Zabaleta, 2007). When the results of any form of student evaluation, formal or informal, are used in summative hiring, tenure, and promotion decisions, it is important to include other assessments of teaching effectiveness, for example, peer and supervisor teaching evaluations as well as teaching portfolios (Marsh, 1984;Marsh & Roche, 1997;Zabaleta, 2007).