The pervasive under-representation of women researchers, specifically in tenured and tenure-earning faculty positions in Science, Technology, Engineering and Mathematics (STEM) (Bilen-Green et al., 2008; Lariviere et al., 2013; Shen, 2013), along with various challenges women face in their academic career progression (Bedi et al., 2012; Clauset et al., 2015; Edmunds et al., 2016; Handelsman et al., 2005; Moss-Racusin et al., 2012; Quadlin, 2018), calls for continued research on gender equity in academic settings. One important form of gender inequity is pay inequity. Academic researchers are expected to be paid equitably based on their research productivity (i.e., pay-for-productivity). Nonetheless, are men and women really paid equally for the same level of research productivity? Or is pay-for-productivity just a myth for women in tenured and tenure track faculty positions? If gender inequity of pay-for-productivity exists, women are likely discouraged to continue their careers in academia, which may help explain the “leaky pipeline” (Clark Blickenstaff, 2005) problem seen in STEM as compared to Social and Behavioral Sciences (SBS) disciplines. To date, many studies only examine gender differences in academic salary while controlling for productivity (Bellas, 1997; Euwals & Ward, 2005; Ginther, Donna K. & Hayes, Kathy J., 2003; Umbach, 2007) and the results are mixed, leaving gender differences in the strength of the pay-for-productivity relationship unexamined. In other words, it is unclear if the gender pay gap depends on a faculty member’s productivity level. Drawing from theory and research on social roles, we further examine gender differences in pay-for-productivity in STEM and SBS disciplines.

In the present research, we aim to address three questions regarding pay-for-productivity in academic settings: (1) whether, and if so, how strongly, research productivity is positively related to researcher pay (i.e., the intensity of pay-for-productivity), (2) whether productivity is more strongly tied to pay for men than for women (i.e., interaction of gender and pay-for-productivity), and (3) whether gender inequity of pay-for-productivity, if any, is more severe in the STEM disciplines than in the SBS disciplines (i.e., disciplinary difference in gender inequity of pay-for-productivity).

Pay-for-productivity

Pay-for-productivity, from a work motivation perspective, is deemed fair by many workers and motivates them to achieve desired results (Lawler, 1971; Maier, 1955). Meta-analytic studies suggest performance-contingent pay is among the best methods for boosting performance levels (Rynes et al., 2004, 2005). In academic institutions classified as R1 by the Carnegie Classification of Institutions of Higher Education, research constitutes the most important job responsibility and is a significant factor determining tenure success, promotions, and pay raises across a host of academic disciplines (Fairweather, 2005). Thus, besides their intrinsic motivation, academic researchers’ extrinsic motivation to produce research is, to some degree, driven by the extent to which their research productivity is linked to their pay. The University of Arkansas for Medical Sciences introduced a performance-based incentive plan for its College of Medicine in 2005 (Reece et al., 2008). With faculty pay directly linked to productivity, performance increased drastically, leading to a total compensation increase of about 20%, in addition to increases in external funding and researchers’ morale and satisfaction (Reece et al., 2008).

Some previous studies focused on whether men and women researchers receive equal pay while controlling for factors such as academic ranks, leadership positions (Jagsi et al., 2012), and raises (Lindley et al., 1992) as proxies for research productivity. Others have controlled productivity by controlling for the number of publications (e.g., number or articles or books; Bellas, 1997; Euwals & Ward, 2005; Ginther et al., 2003; Levin & Stephan, 1998; Umbach, 2007), without any measure of quality of the publications. In contrast, we explicitly measure research productivity with h-index and investigate whether higher research productivity (and quality) translates into higher pay to the same extent for men and women in academia (i.e., pay-for-productivity). A researcher’s h-index has become one of the most widely used and common metrics to quantify scholarly productivity. Introduced 15 years ago by Hirsch, it refers to the number of publications (h) that have received at least h citations each (Hirsch, 2005). For example, a researcher who has ten publications with at least ten citations (with all other publications having less than ten citations each), would have an h-index of 10. Although the popularity of this index has skyrocketed, researchers have acknowledged its’ shortcomings including: the susceptibility of inflation due to self-citations (Bartneck & Kokkelmans, 2011; Zhivotovsky & Krutovsky, 2008), favoring more established researchers (Hirsch, 2005), no adjustment for multiple-authorship or order of authors, and no normalization of differential citation practices between disciplines (Alonso et al., 2009). Regardless of these drawbacks, the h-index is a single, easily calculable number that incorporates both a measure of quantity in the number of publications, and a proxy for quality in terms of number of citations, and is widely used as a decision-making tool within higher education for hiring and tenure (Barnes, 2017; Scruggs et al., 2019). Therefore, its effect on compensation should be examined to determine the full utility of this metric.

Hypothesis 1

Research productivity is positively related to researcher salary in STEM and SBS disciplines.

Gender differences in pay-for-productivity

Researchers who identify as men earn around 20% more than their women peers (Carlin et al., 2013; Jagsi et al., 2012; Lindley et al., 1992). Despite shifts in the distribution of men and women researchers in faculty rank, the gender pay gap has not diminished in the last 10 years. In 2020, on average across all disciplines, assistant professors who identify as women make $7605 less than their peers who identify as men, and this difference more than doubles at the full professor level, with women full professors making $19,030 less than full professors who are men (The Annual Report on the Economic Status of the Profession, 2019–2020, 2020). Disparities between disciplines may partly explain these gender differences as higher paying disciplines (i.e., biological sciences, engineering, and mathematics) tend to have more researchers who are men versus lower paying disciplines (i.e., English, sociology, and gender studies) with more women researchers(Shulman et al., 2017). However, even in disciplines with a high proportion of women, there is still gender pay inequity and thus differences in average discipline pay cannot entirely explain gender pay inequity. One study reported men in disciplines one standard deviation above the mean in representation of women will earn approximately $75,0000 versus women earning $69,000 (Umbach, 2007).

Another partial explanation for gender pay inequity has focused on the “productivity puzzle” of women having lower average productivity levels (Cole & Zuckerman, 1984; West et al., 2013; Xie & Shauman, 1998). A plethora of contributing factors have been examined to possibly explain women’s lower productivity levels including family responsibilities (Ceci & Williams, 2011; Fox, 2005; Hunter & Leahey, 2010), resource allocations (Duch et al., 2012), and research specialization (Leahey, 2006). However, recent analyses of archival data suggest no gender differences in journal acceptance of publications, nor in productivity levels when controlling for structural differences, implying that when given equal resources, men and women publish equally well (Ceci & Williams, 2011; Huang et al., 2020). While investigating gender differences in productivity levels is an important research topic, in the current study we are not examining why differences may occur, but instead if men and women are paid equitably for their individual productivity level. Research on whether the gender salary gap in academia disappears after controlling for productivity is mixed (Bellas, 1997; Euwals & Ward, 2005; Ginther et al., 2003; Umbach, 2007). Only one study to date has examined gender differences in pay-per-performance relationship in specific STEM disciplines (physics, earth science and physiology), and found women were paid more per publication than men, but only for physics (Levin & Stephan, 1998). In addition to the data being from the 1970’s, the authors only examined the change in salary in a two-year period, likely missing crucial overall salary differences.

Gender differences in pay-for-productivity can manifest in two ways. First, social role theory grounded expectations for women’s performance may emphasize their communal roles as mentor, rather than their productivity or agentic characteristics (Cejka & Eagly, 1999; Koenig & Eagly, 2014). In cases where women do not adhere to gender role expectations, social role theory grounded expectations may still lead them to be perceived as less productive and competent and perceived as having lower status than men (England, 1992; Heilman, 2001); therefore, women are not paid as much as men when they perform well. Second, although women are encouraged to negotiate their salary and other employment terms, compared to men, women researchers’ salary negotiations or requests for salary adjustments are less likely to succeed (Leibbrant & List, 2015). Women tend to anticipate backlash for their salary negotiation/request attempts; therefore, they may either opt to not initiate their salary negotiations/requests or lower their aspirations if they decide to do so (Amanatullah & Morris, 2010; Amanatullah & Tinsley, 2013). Women’s salary negotiation attempts are sometimes viewed as aggressive acts, and frequently invite hostile reactions from others (Rudman et al., 2012). Because of gender bias in salary negotiations disfavoring women, we argue that research productivity does not translate into women researchers’ pay as much as men researchers’ pay.

In the current study we focus on research productivity in STEM and SBS fields and examine the gender differences in the strength of pay-per-productivity, that is look at gender differences in the relationship between h-index and salary (not just changes in salary). Looking at gender differences in pay-per-productivity, allows us to examine if gender pay inequity differs across levels of productivity. If women are paid according to stereotypes, then women who have low productivity will be paid the correct amount, but high producing women will be underpaid because they are assumed to be underproductive (i.e., perceived productivity mismatches actual productivity). Thus, we expect that there will be gender salary differences at high performance levels and not at low performance levels.

Hypothesis 2

The link between research productivity and researcher salary is stronger among men researchers than among women researchers. Such that, men are paid more per h-index and gender pay inequity is larger at higher levels of productivity.

STEM vs SBS

Our final inquiry pertains to the disciplinary difference in gender inequity of pay-for-productivity. If this inequity does exist, does it vary across academic disciplines? Specifically, is the hypothesized inequity more severe in disciplines where women are traditionally under-represented than in other disciplines? Women are less likely to enter STEM, feel less welcomed in these disciplines, and are less likely to stay in tenure or tenure-earning positions in these disciplines (Clauset et al., 2015; Edmunds et al., 2016; Handelsman et al., 2005). Furthermore, some evidence suggests that the gender pay gap is larger in STEM disciplines (Umbach, 2007; Xu, 2015) than in other disciplines, even when researchers control for gender differences in productivity. We postulate women having difficulty to effectively negotiate compensation to be more pronounced in STEM disciplines than in other disciplines such as social and behavioral sciences (SBS) where we expect this gender inequity to be less severe.

In support of our expectations, social role theory (Eagly, 1987) suggests that gender roles prescribe what men and women should be like and provide gendered rules and norms based on which behaviors are judged and rewarded or socially sanctioned. Men are expected to be achievement-oriented, competitive, and analytic, whereas women are expected to be warm, considerate, and accommodating (Eagly & Karau, 2002; Heilman, 2001). Women are not expected to pursue STEM; instead, they are more expected to pursue SBS such as psychology, communication, sociology, etc. (Clark Blickenstaff, 2005; Handelsman et al., 2005). Women in STEM disciplines violate such gender role expectations and thus face unfavorable evaluations and other social sanctions. In contrast, women researchers in SBS disciplines are less likely to violate gender role expectations and thus may face fewer negative consequences. Such gender role expectations are particularly strong in fields dominated by men such as STEM disciplines as the norms are shaped by men. Women researchers who are achievement-oriented, competitive, and analytic inevitably violate gender role expectations and thus face social sanctions including unfavorable evaluations and social exclusion. These gender role expectations coupled with stereotypes of women as low performers could result in lower female salaries relative to male salaries, but only for high performing women in STEM disciplines, as women with lower productivity are meeting prescriptive gender stereotypes. Thus, we would expect stereotyping of productivity and gender differences in negotiation tactics to affect the salaries of highly productive women in academic STEM disciplines.

Hypothesis 3

The gender difference in the link between research productivity and researcher salary is larger in STEM versus SBS disciplines.

Materials and methods

We collected research productivity and salary data of 3033 tenured and tenure-earning faculty members from 17 universities across the United States. Department chairs were excluded from the analyses. Our criteria for the university selection were based on a study conducted for a National Science Foundation ADVANCE institutional transformation project. The selected data collection sites were large public universities in urban settings that were classified as R1 institutions (i.e., highest research activity by the Carnegie Classification of Institutions of Higher Education). Among these universities, we selected those that made salary data publicly available. In the first step, coders manually searched department websites of all 17 universities, and created a database combining researchers’ gender and discipline information and their demographic information retrieved from their publicly available CVs. In the second step, we used an automated approach to scrape each researcher’s research productivity information (h-index) from Google Scholar, and collected salary data from websites reporting current 9-month faculty salaries.

Measures

Gender

The coders utilized a combination of photographs available on departmental websites and names to code each researcher’s gender (1 = woman, 0 = man).

Research productivity

Research productivity was indicated by the h-index in 2019 (Hirsch, 2005), which was scraped from each tenured and tenure-earning faculty member’s Google Scholar website. The h-index is the most used metric for research productivity, with h being the number of papers a researcher has authored or co-authored that has accumulated at least h citations (Hirsch, 2005).

Salary

We collected the 9-month faculty salary data from various websites containing university-published current faculty salaries, as noted earlier.

Controls

We controlled for the number of years since the attainment of Ph.D. (i.e., post-Ph.D. years) at the individual level and the following department level controls by utilizing group-mean centering in our multilevel models: proportion of women in department, average department years since the attainment of Ph.D. (i.e., post-Ph.D. department tenure); and mean of h-indices within each department. Our random intercepts multilevel model inherently controlled for the average salary level of the department. We controlled for post-Ph.D. years to ensure that salary increases were attributed to increases in research productivity rather than just researchers’ tenure in their discipline. Our multilevel controls ensured we controlled for university and discipline differences because department averages will be affected by both.

Results

Descriptive statistics

Table 1 presents descriptive statistics and correlations among post-Ph.D. years, the h-index, and salary. Correlations are presented separately for men and women researchers. The average amounts of men and women researchers’ salary were $133,092.40 and $118,459.20, respectively. Women, on average, made 89 cents for every dollar made by men. With 95% confidence, the average salary for men was $10,850.63 to $18,415.71 more than that of women researchers (i.e., 9.16% to 15.55% more than the average salary for women). Gender difference in the h-index may partially explain this gender gap of salary. With 95% confidence, we found that men’s average h-index was 5.32 to 8.33 higher than that of women. The gender difference in the h-index could partially be explained by the gender difference in post-Ph.D. years. Also, with 95% confidence, we found that men had 3.80 to 5.51 more post-Ph.D. years than women.

Table 1 Descriptive Statistics and Correlations

Multilevel regression analyses

We tested our hypotheses by conducting multilevel regression analyses, given that our data were nested within academic departments (e.g., Psychology department at the University of Houston). We centered gender, post-PhD years, and h-index by their respective group (department) means (Enders & Tofighi, 2007) (mean of gender is a proportion). In all reported models, for the sake of parsimony, we did not enter the department means of gender, post-Ph.D. years, and the h-index as predictors because (a) we did not hypothesize the effects of these department means, and (b) inclusion or exclusion of these department means did not change the result patterns, presumably because we group-mean centered. The ICC of salary estimate of 22.47% (i.e., 22.47% of the variance in salary could be explained by cross-department differences) further justified our use of multi-level regression analyses. Department-level salary variability can be explained by both university and discipline differences. Table 2 presents the results of the multi-level regression analyses, with profile confidence intervals being reported in the main text. The baseline model included two control variables: post-Ph.D. years and gender (1 = woman, 0 = man), with the former being a significant predictor of salary (B = 2,186.66, t = 35.22, p < 0.01).

Table 2 Multi-level regression analysis for hypotheses 1–3

In line with Hypothesis 1, researchers’ h-index, indicative of their research productivity, was positively related to their salary level (see Model 1, Table 2). On average, a one-point increase in the h-index translated into a salary increase of $1,000.46 (t = 22.17, p < 0.01), with its 95% confidence interval [$912.01, $1,088.90]. We did not find support for Hypothesis 2. Specifically, the interaction between gender and the h-index was not significant (Model 2: B =—120.70, t = -1.17, p = 0.24). In other words, pay-for-productivity did not differ significantly between men and women researchers when examining both STEM and SBS discipline simultaneously. Finally, we found support for Hypothesis 3 regarding gender inequity of pay-for-productivity in STEM versus SBS disciplines; the three-way interaction among gender, the h-index, and academic discipline dummy (STEM vs. SBS) was negatively related to researchers’ salary level (Model 3: B = -397.75, t = − 1.86, p = 0.063).

We then probed the two-way interaction between gender and the h-index separately for STEM and SBS disciplines. For the latter, gender inequity of pay-for-productivity was not significant (B = 141.80, t = 0.76, p = 0.45). However, for the former, pay-for-productivity was unfavorable to women versus men (B = -266.66, t = -2.13, p = 0.03). On average, in STEM disciplines, men were paid $266.66 (95% confidence interval [$20.95, $512.61]) more than women for each one-point increment in h-index. Figure 1 shows the interaction between gender and the h-index for both STEM (Fig. 1a) and SBS (Fig. 1b) disciplines using group mean centered variables. As demonstrated, for STEM disciplines, as h-index increases, predicted salary for men is higher than for women.

Fig. 1
figure 1

Relationship between h-index and salary for STEM and SBS researchers. Plots were generated using group mean centering for h-index and gender. Ranges for both axes have been fixed to allow for comparison

Discussion

The present research reveals gender inequity of pay-for-productivity in STEM disciplines. Consistent with work motivation theories (Rynes et al., 2004, 2005), we did find that researchers’ salary is coupled with their research productivity as intended, but this pay-productivity coupling was more favorable to men versus women, particularly in STEM disciplines. It is interesting to note that previous research demonstrated high performing women in STEM may need to overcompensate (i.e., build more relationships, acquire more knowledge, or put in more research hours) to achieve the same level of productivity indicators as their male colleagues (Aguinis et al., 2018). Thus, not only is the road to becoming a “star” performer more difficult for women, they may not also see the same returns in compensation for their research investments. Women researchers in STEM with a h-index of 49 (one standard deviation above the mean) made around six thousand dollars less than men researchers in STEM with the same h-index. Our study did not follow researchers longitudinally, but we can tentatively extrapolate how a six-thousand-dollar salary gap can add up over the years (i.e., over a ten-year-period this difference would add up to sixty-thousand-dollars). Depending on how their h-index develops over one’s career, a highly productive woman researcher in STEM could experience even more pay inequity.

As with any paper, our study is not without limitations. In contract to studies examining pay differences in non-Western cultural contexts (Takahashi et al., 2018) our study focused on North American academics, we expect basic social psychological processes grounded in role theory expectations and gender differences in negotiation behaviors and negotiation outcomes to be similar across cultural contexts. However, in countries where compensation is more strongly driven by federally or locally imposed pay rates, productivity-compensation differences should be weaker across gender. We recommend subsequent research account for cultural contexts and structural differences in compensation structures in academic settings to examine the external validity of our findings across cultural contexts. Also note that in our paper, we aimed to determine linear relationships between productivity and compensation and the moderating role of gender. Hence, for more nuanced analyses, including analyses of star performers’ performance (Aguinis et al., 2018) and compensation, or non-linear effects to be determined, we recommend researchers build large, multi-university consortium structures to access large enough data sets to conduct meaningful analyses of a non-linear nature or on subsets (e.g. star performers, faculty of color, faculty with intersectional identities).

Our finding renders support for funding agencies’ (i.e., National Science Foundation) efforts for reducing gender inequity in STEM disciplines (Ceci & Williams, 2011) and yet reveals the lingering challenge inherent in these efforts. Given that our analyses relied on archival data, we could not accurately code the race/ethnicity of researchers and thus did not include this demographic factor in our analyses. However, we speculate that pay-for-productivity may further disadvantage those with intersectional identities, such as women of color in STEM disciplines. Given that our focus was on determining whether there is a gender inequity of pay-for-productivity across disciplines, we offer some plausible explanations without testing these explanatory mechanisms. Future research should hence shed light on these possible mechanisms to ultimately identify ways to close gaps. For example, why, when, and how pay-for-productivity relationships are weaker for women in STEM may be a result of fewer women attempting to continuously renegotiate their salary. Alternatively, men may be more likely to seek offers from other institutions and their salary may benefit as a result. Last, it may be possible that women’s attempts to renegotiate their salary based on incremental performance results in negative reactions from administrators at the departmental, college, and university levels.

In our analyses we used the h-index as an indicator of research productivity. We encourage future researchers aiming the productivity-pay link to use broader or supplemental indices of productivity, such as external funding records and total citations. Even though the h-index is a widely known metric for research productivity and is used as a decision-making tool, it is not without weaknesses. For instance, intentional manipulation of the h-index by researchers through self-citations or inclusion of work authored by others may render the metric problematic for exclusive use as a research productivity indicator.

We further urge universities to regularly conduct internal analyses to adjust potential gender inequity of pay-for-productivity. Likewise, professional associations in STEM disciplines should regularly conduct such analyses to reduce the more limited pay-for-performance relationships we observed for women in our study. Notably, we do not intend to assert that the h-index should be treated as the benchmark for research productivity, as it is not problem- or concern-free. However, the h-index is to the measurement of scholarly productivity what democracy is to forms of government: the least problematic. We also urge universities to continuously assess whether high levels of research productivity translate into high pay at similar rates for men and women—the alternative may be to continue to lose women scientists despite high productivity levels and potential. The dearth of women, especially in senior academic/faculty positions in STEM, continues to pose a significant challenge for the science and technology workforce in the twenty-first century. To attract more women to enter STEM disciplines and help them be more engaged and thrive in these disciplines and their organizations, universities should, first and foremost, effectively address the ostensibly “sticky” problem of gender inequity of pay-for-productivity.