Researchers preferentially collaborate with 1 same-gendered colleagues across the life sciences 2

4 Evidence suggests that women in academia are hindered by conscious and un-5 conscious biases, and often feel excluded from formal and informal opportunities for 6 research collaboration. In addition to ensuring fairness and helping to redress gender 7 imbalance in the academic workforce, increasing women’s access to collaboration could 8 help scientiﬁc progress by drawing on more of the available human capital. Here, we 9 test whether researchers preferentially collaborate with same-gendered colleagues, using 10 more stringent methods and a larger dataset than in past work. Our results reaﬃrm that 11 researchers preferentially co-publish with colleagues of the same gender, and show that 12 this ‘gender homophily’ is slightly stronger today than it was 10 years ago. Contrary 13 to our expectations, we found no evidence that homophily is driven mostly by senior 14 academics, and no evidence that homophily is stronger in ﬁelds where women are in the 15 minority. Interestingly, journals with a high impact factor for their discipline tended to 16 have comparatively low homophily, as predicted if mixed-gender teams produce better 17 research. We discuss some potential causes of gender homophily in science. 18

A high, steadily increasing proportion of research papers is written by more than one author [3], making collaboration a key predictor of publication output, and thus of career prospects [40,41].Additionally, empirical studies imply that mixed-gender or otherwise diverse teams produce better outputs on collaborative tasks than less diverse teams [42][43][44][45][46][47][48].For reasons such as these, multiple studies have examined the author lists of published research articles in order to test for gender differences in collaboration frequency or pattern.To our knowledge, most or all such studies imply that men co-publish with men, and women with women, more often than expected if collaborators assort randomly with respect to gender [49][50][51][52][53][54][55][56][57][58].This pattern of assortative publishing has often been termed 'gender homophily'.
However, we believe that prior studies of gender homophily were hindered by a largely unacknowledged statistical issue that we name the Wahlund effect (Figure 1), by analogy with the conceptually similar Whalund effect in population genetics [59].The Wahlund effect makes it deceptively difficult to infer gender-based preferences simply by counting the number of same-and mixed-gender coauthorships.Essentially, whenever coauthorship data are sampled from two or more discrete sets of literature, which vary in the author gender ratio and which are largely not connected by collaboration, the number of samegendered coauthors will be inflated.This can give the impression that authors preferentially publish with same-gendered colleagues even if no gender preferences exist, or if the true preference is for opposite-gendered colleagues ('gender heterophily').For example, a sample of literature containing bioinformatics and cell biology papers will probably contain an excess of mostly-male and mostly-female author lists, simply because researchers usually collaborate within their own discipline, and because the author gender ratio is more male-biased in bioinformatics than in cell biology [5].
In the present study, we test whether life sciences researchers tend to co-publish with samegendered colleagues, while controlling for the Wahlund effect as strictly as possible.We use a

Overall set of papers
Male-biased subset, e.g.The Wahlund effect can make it appear as if authors prefer to publish with same-gendered colleagues, even if no such preference exists.Here, coloured circles represent male and female authors, and coauthors are linked with lines.Across the whole set of ten papers, there is an apparent excess of same-gender collaborations: there are six same-gender papers and only four mixed-gender papers, which is fewer than the 10 × 2 × 0.5 × 0.5 = 5 mixed-gender papers expected under the null hypothesis that authors assort randomly with respect to gender.However, within each subset, there is no evidence that authors prefer to publish with same-gendered individuals (if anything, this small dataset suggests gender heterophily).The Wahlund effect will tend to inflate the frequency of same-sex coauthorships whenever the data is composed of two or more disconnected subsets of literature with different author gender ratios; these subsets could be research disciplines, older versus newer papers, or papers from authors in different countries.The example countries and disciplines were selected based on [5].
recently-published dataset describing the gender of 35.5m authors from 9.15m articles indexed on PubMed [5].Holman et al. [5] reported large differences in the gender ratio of authors across research disciplines, journals, countries, and across the years 2002-2016.We therefore tested for gender homophily while restricting our analysis to particular journals (i.e. research specialties), time periods, and countries.We quantified gender assortment using a metric called α [60], which is positive when same-gender authors publish together more often than expected (gender homohily), negative when opposite-gender authors publish together more often than expected (heterophily), and equal to zero when authors assort randomly with respect to gender (see Methods).The white area shows the number of journals for which homophily was significantly stronger than expected under the null hypothesis (corrected p < 0.05), while the blue area shows all the remainder.Patterns were similar whether α was calculated for all authors, for first authors only, or for last authors only.All disciplines had positive mean α , although homophily appeared somewhat stronger in some disciplines than others (e.g.mean α was 0.12±0.02for Urology journals and 0.03±0.01

Variance in homophily between disciplines
for Veterinary Medicine journals; Figure 2, S4 Data).However, there was no formal evidence for consistent differences in α between disciplines: the random factor 'Discipline' explained around 1% of the variance in α in the two linear mixed models described in the previous section (see Figure 2 and mixed models in Online Supplementary Material).Thus, the processes responsible for producing positive α values appear to be similarly strong in all the disciplines we examined.
There was no indication that journals publishing on a wide range of topics have higher α values than more specialised journals, due to the Wahlund effect.For example, the journal category 'Multidisciplinary' -which includes journals like PLoS ONE, Nature, Science, and PNAS -did not have notably elevated α (Figure 2).This result suggests that our estimates of homophily, and estimates from some earlier studies, are not notably inflated by the presence of disparate research topics (with variable author gender ratios) being published within individual journals.

Relationship between gender homophily and number of authors
Papers with two authors had significantly lower (but still positive) α values relative to papers with more than two authors, on average (Figure 3; statistical results in Online Supplmentary Material).Papers with 3, 4 or 5+ authors had essentially identical average α values.The variance in α across journals was also a little higher for 2-authors papers compared to the remainder (Figure 3), though part of this variance is due to the reduced sample size (in terms of number of authors) for the 2-author papers.

Relationship between gender homophily and gender ratio
We next tested whether researchers are more or less likely to publish with same-gendered colleagues in strongly gender-biased disciplines (e.g.Surgery or Nursing), relative to disciplines with a comparatively gender-balanced workforce (e.g.Psychiatry).We found a positive, nonlinear relationship between the overall gender ratio of all authors publishing in a particular journal [5], and the estimated value of α for all authors and for first authors (Figure 4).
Journals with a balanced or female-biased author gender ratio tended to have higher α than journals with a male-biased author gender ratio (GAM smooth terms p < 0.001; Online Supplementary Material).The relationship was not statistically significant when α was calculated for last authors (GAM, p = 0.142), though the trend appeared similar (Figure 4).% women authors (relative to gender parity)

Coefficient of homophily (α')
Figure 4: There is a weakly positive, non-linear relationship between the gender ratio of authors publishing in a journal, and the coefficient of homophily (α ).Specifically, journals with 50% women authors or higher tended to have more same-sex coauthorships than did journals with predominantly men authors.This relationship held whether α was calculated for all authors, first authors only, or last authors only.A negative value on the x-axis denotes an excess of men authors, a positive value denotes an excess of women authors, and zero denotes gender parity.The lines were fitted using generalised additive models with the smoothing parameter k set to 3.

Relationship between journal impact factor and gender homophily
We observed a noisy but statistically significant linear relationship between standardised journal impact factor and α , such that journals with a high impact factor for their discipline had weaker gender homophily than did journals with a low impact factor for their discipline (Figure 5; linear regression: R 2 = 0.043, t 1415 = -8.0,p < 0.0001).The slope of the regression was −0.012±0.0015,indicating that increasing the discipline-standardised impact factor by one standard deviation is associated with a reduction in α of 0.012.

Analysis accounting for differences in author gender ratio between countries
When we restricted the analysis by country, we observed statistically significant homophily for 72 of the 325 journal-country combinations tested (64 unique journals and 18 unique countries), and no significant heterophily (S4-S5 Fig) .Additionally, the values of α calculated for each journal-country combination were only very slightly lower than the α values calculated for the journal as a whole (i.e. when pooling papers from different countries, as was done to make Figure 2): on average, the difference in α was only 0.002 (S6 Fig) .These results suggest that our findings of widespread homophily in the main analysis were not driven solely by a Wahlund effect resulting from gender differences between countries.

Theoretical expectations for α when the gender ratio differs between career stages
As shown in Figure 6, we predict that α is expected to be non-zero, even if collaborators are randomly selected with respect to gender, provided that there is a gender gap between career stages.The extent to which α deviates from zero depends on the relative frequencies of collaboration within and between career stages.When >50% of collaborations were between early and established researchers, we expect gender heterophily (α < 0).Conversely, when >50% of collaborations occured within career stages, we expect gender homophily (α > 0).
In a few parameter spaces (shown in red; Figure 6), α was quite high, and overlapped with the values that we estimated (Figure 2).
Despite this overlap, Figure 6 suggests that our main conclusions (and those of other studies of gender homophily) are probably robust to this career stage issue.We only expect strongly positive α when A) the gender ratio is highly skewed across career stages (e.g. a 5-fold difference), and B) collaborations between early and established researchers are very rare (e.g.<10% of the total).Both of these conditions are untrue for most fields: the gender gap across careers stages is generally less pronounced [1,5], and it is very common for early-career researchers to co-publish with an established mentor [61].However, one can get α > 0 for realistic combinations of parameters, e.g. a moderate shortage of women in senior positions coupled with a moderate excess of within-career stage collaboration, suggesting this effect might contribute to some of the observed homophily (in this and previous studies).
Lastly, we note that if there is a gender gap between career stages and coauthorships between early-career and established researchers comprise >50% of the total, then the baseline expectation for α is actually less than zero (blue areas in Figure 6).Therefore, our results might under-estimate the extent to which researchers preferentially select same-gendered collaborators in some cases.

Figure 6:
When there is a difference in gender ratio between early-career and established researchers, and collaboration is non-random with respect to career stage, the null expectation for α deviates from zero.An excess of collaborations between career stages gives the appearance of gender heterophily (lower rows, blue areas), while an excess of within-career stage collaborations produced apparent gender homophily (upper rows, red areas).However, the conditions required for strong gender homophily are quite restrictive, making it unlikely that this issue explains all of the homophily observed in Figure 2. Contour lines denote increments of 0.1.

Discussion
We found evidence that researchers preferentially publish with same-gendered coauthors, even after implementing stringent controls for Wahlund effects (Figure 1).Our study therefore reaffirms earlier studies' conclusions [49][50][51][52][53][54][55][56][57]62] and establishes their generality across the life sciences.Relatively few journals had α values below zero, and almost no journals showed statistically significant gender heterophily after controlling for multiple testing.The excess of same-gender coauthorships was quite large: many journals had α > 0.1, indicating that the gender ratio of men's and women's coauthors differs by >10% in absolute terms.In relative terms, our findings are even more striking: for example, if men have 20% female coauthors and women have 30% (i.e.α = 0.1 in a field with a typical gender ratio [5]), then women publish with women 50% more often than men do.
An important limitation of our study is that we cannot reliably determine the cause(s) of the observed excess of same-gender coauthorships.As well as the obvious interpretationconscious or unconcious selection of same-gender collaborators by men, women, or both -our results could be partly explained by uncontrolled Wahlund effects.However, we suspect the contribution of these to be minor, for four reasons: we found positive α after controlling for three obvious sources of Wahlund effect; there was no inflation of α in highly multidisciplinary journals; restricting the data by country yielded similar estimates of α ; and we showed that differences in gender ratio between career stages are unlikely to fully explain our results.On balance, we believe the data suggest that it is likely that some researchers do preferentially select same-gendered collaborators, although the frequency and strength of this preference is difficult to ascertain.
We hypothesised that disciplines with a strongly skewed gender ratio might show the strongest gender homophily, e.g. because being in the minority might increase motivation to seek out same-gendered colleagues.Contrary to this hypothesis, we found no evidence that gender homophily is restricted to particular disciplines: α was similarly high across the board (Figure 2).Interestingly, gender homophily was weakest for journals with a male-biased author gender ratio, and strongest in journals with a female-biased author gender ratio.This may suggest that men are more likely to preferentially seek out male collaborators in fields where men are a minority, relative to the homophily displayed by women in fields where women are a minority.However, this latter result is only tentatively supported since our sample contains few journals in which most authors are women (Figure 4).
We also found that gender homophily was marginally stronger in 2015-2016 relative to [2005][2006].Although this trend might reflect a change in the gender preferences of researchers seeking collaborators, there are alternative (and perhaps more likely) explanations.For example, this trend might result from the increasing number of women working in senior positions in STEMM over the past decade [63][64][65].As shown in Figure 6, if enough coauthorships are between junior and senior researchers, a large gender gap between career stages can give the appearance of heterophily.As this gender gap between career stages lessens, the observed values of α may increase.
Regarding our finding of weaker homophily among 2-author papers, we suspect that many 2-author teams comprise a student/postdoc and a senior staff member, making these teams especially likely to be mixed-gender, due to the elevated gender gap among senior researchers [1,5].Assuming this interpretation is correct, this result suggests that our reported α values may underestimate the strength of peoples' preferences for same-gendered collaborators; essentially, women seeking a senior collaborator could be constrained to work mostly with men, meaning that people's ideal and realised gender preferences would be mismatched.
On a related note, Ghiasi et al. [51] argue that women in engineering are "compliant [in reproducing] male-dominated scientific structures" because they do not collaborate often enough with other women (their Figure 7 suggests that coauthorships between women are 30% more frequent than expected under random assortment).By contrast, we feel that it is not helpful to recommend that women collaborate primarily with other women, e.g. because this constrains women's options and may be counter-productive (particularly in fields like engineering, where 90% of professors are men [1]).Instead, we suggest that researchers of both genders can help to close the gender gap in STEMM.In the context of collaboration, one way to do this is to undertake self-examination to ensure that one is not inadvertently overlooking or excluding female potential students and colleagues.One should also take care to treat male and female collaborators equally, e.g. in terms of training and mentoring, allocation of work, and how one frames or promotes the collaboration (e.g. in conference presentations or on a website); evidence suggests that unconscious bias causes people to undervalue women's research achievements [20], and possibly to assign menial or under-valued tasks to women and more prestigious tasks to men [61].
Our study begs two questions: what causes gender homophily in science, and are our results cause for concern?These questions are closely related.For example, some of the homophily we observed might be caused by women seeking to avoid harassment or sexism from men [38], which would clearly be concerning.Additionally, Sheltzer and Smith [66] concluded that 'elite' male academics (defined as recipients of major honours) have a higher proportion of male students and postdocs than non-elite male academics.This finding could contribute to the homophily we observed, and is cause for concern since Sheltzer and Smith [66]'s results might reflect discrimination against women during hiring [20], or avoidance by women of elite research groups (e.g.due to gender differences in confidence, or a perception that some groups are sexist).We also found a little evidence that gender homophily is detrimental to research quality, in that high-impact journals tended to have weaker homophily.Assuming that papers published in high-impact journals are of higher average quality [67], our results provide non-experimental support for the hypothesis that mixed-gender teams produce better research than single-gender teams [42][43][44][45][46][47][48].Another issue is that if many collaborations are between established researchers, there will be an excess of male-male collaborations in fields where women in senior positions are rare; some of the observed homophily might therefore reflect the elevated gender gap among senior researchers.
On the other hand, homophily might have more benign causes.Collaboration is often most enjoyable and productive when working with like-minded people, who might be same-gendered more often than not.We also suppose that some people consciously choose to preferentially collaborate with women in order to help close the gender gap in the workforce; this would create homophily if women do this more than men.In support of this interpretation, women appear more likely than men to promote the work of female colleagues by inviting them to

Gender and coauthorship
give talks [68,69].Given that many collaborative research projects unfortunately involve a gendered division of labour [61], working with a same-gendered colleague may provide exposure to new parts of the research process, and (especially for the minority gender) a welcome change of pace.

Gender and coauthorship
necessitate culling the dataset to include only papers with a sufficiently long author list, complicating interpretation of the results.
We also calculated α for papers with 2, 3, 4 or ≥5 authors, for all journals that had at least 50 suitable papers from 2015-2016 with the specified author list length.
Our test assumes that the expected value of α is zero if authors randomly assort, but for small datasets this assumption is not always true (as pointed out by Carl Bergstrom in a blog post, http://www.eigenfactor.org/gender/assortativity/note_to_eisen.rtf).To borrow Prof. Bergstrom's example, consider a small research specialty comprising just two men and two women researchers, who have together produced six two-author papers: one in each of the six possible two-author combinations.For these six papers, α = − 1 3 , even though sameand opposite-gendered coauthors were selected in equal proportion to their frequency in the pool of possible collaborators.
To control for the fact that the null expectation for α is not zero for small datasets, we devised an adjusted version of the coefficient of homophily, which we term α .Every time we calculated α for a set of papers, we also determined the expected value of α under the null hypothesis that authors assort randomly with respect to gender.This was accomplished by randomly permuting authors across papers 1000 times, recalculating α, and taking the median.We then calculated α by subtracting the null expectation for α from the observed value.We also used the null-simulated α values to calculate a two-tailed p-value for the observed value of α; the p-value was defined as the proportion of null simulations for which

Minimising the Wahlund effect: research discipline and time period
To minimise bias in α due to the Wahlund effect, we restricted each set of papers to a single research specialty to the greatest extent allowed by our data.Specifically, we only calculated α for individual journals, since papers from the same journal typically focus on closely related topics.Although some journals, e.g.PLoS ONE, publish research from diverse disciplines with very different author gender ratios [5], calculating α for these highly multidisciplinary journals is still useful as a contrast.The difference in α between highly multidisciplinary and more specialised journals, e.g.PLoS ONE versus PLoS Computational Biology, gives an estimate of the extent to which multidisciplinarity inflates α .
As well as varying between disciplines, the gender ratio of authors has changed markedly over time [5].Because the gender ratio was more male-biased in the past, α would be inflated if we calculated it for a sample of papers published over a long enough time frame.To minimise this effect, we only sampled papers from two one-year periods (namely 2005-6 and 2015-16).

Gender and coauthorship
gender would simply be an imperfect correlate of the true causal effect, while the last author's gender would be the causal effect itself.
To test whether α for last authors tends to be higher than α for first authors for any given dataset, we used a linear mixed model implemented in the lme4 and lmerTest packages for R, with authorship position (first or last) as a fixed factor, and journal and research discipline as crossed random effects.The response variable was α , and we weighted each observation by the inverse of the standard error from our estimate of α , meaning that more accurate measurements of α had more influence on the results.We used a similar model to test for a difference in α between the 2005-6 and the 2015-16 datasets, with two differences: we fit year range as a two-level fixed factor (instead of authorship position), and we used α estimated for all authors (not first/last authors) as the response variable.
The relationship between the gender ratio of authors publishing in a journal and its α value appeared nonlinear (see Results).We therefore fit a generalised additive model with thin plate regression spline smoothing, implemented using the mgcv package for R.
To model the relationship between α and the number of authors on the paper, we used a meta-regression model implemented in the R package brms [75].The model incorporated the standard error associated with easch estimate of α , had author number as a fixed effect, and journal as a random intercept (to control for repeated measures of each journal).We also fit a random slope of author number within journal, thereby allowing the response to author number to vary between journals.We used the default (weak) priors.The full output of this model can be viewed in the Online Supplementary Material.

Theoretical expectations for α when the gender ratio differs between career stages
In many STEMM subjects, the gender ratio is more skewed among established researchers relative to early-career researchers [1,5].We hypothesised that this skew could potentially create both Wahlund effects and 'reverse' Wahlund effects.For example, imagine that the majority of collaborations are between students and professors, and that the gender ratio differs between career stages: we will then see an excess of mixed-gender coauthorships (heterophily, α < 0), even if gender has no direct, causal effect.Similarly, a hypothetical field in which students work only with students, and professors with professors, would have apparent gender homophily (α > 0).
We can think of no tractable method of controlling for this issue using our dataset, which contains no information on career stage.Therefore, we instead decided to derive the theoretical expectations for α when there is a difference in gender ratio across career stages, in order to determine if and how this effect should alter our inferences.For simplicity, our calculations assume there are only two career stages, though we intuit that the general conclusions would also apply to a multi-tier career ladder.Under the null model that gender has no causal effect on collaboration, we calculated α for various combinations of the four free parameters, i.e. the gender ratios for early-and late-career researchers, and the relative frequency of

FemaleFigure 1 :
Figure 1:The Wahlund effect can make it appear as if authors prefer to publish with same-gendered colleagues, even if no such preference exists.Here, coloured circles represent male and female authors, and coauthors are linked with lines.Across the whole set of ten papers, there is an apparent excess of same-gender collaborations: there are six same-gender papers and only four mixed-gender papers, which is fewer than the 10 × 2 × 0.5 × 0.5 = 5 mixed-gender papers expected under the null hypothesis that authors assort randomly with respect to gender.However, within each subset, there is no evidence that authors prefer to publish with same-gendered individuals (if anything, this small dataset suggests gender heterophily).The Wahlund effect will tend to inflate the frequency of same-sex coauthorships whenever the data is composed of two or more disconnected subsets of literature with different author gender ratios; these subsets could be research disciplines, older versus newer papers, or papers from authors in different countries.The example countries and disciplines were selected based on[5].

Figure 2 :
Figure 2:Of the 2116 journals for which we had adequate data in 2015-2016, 825 showed statistically significant evidence of gender homophily (denoted by α > 0), and 1 showed statistically significant evidence of heterophily (α < 0), after false discovery rate correction.The white area shows the number of journals for which homophily was significantly stronger than expected under the null hypothesis (corrected p < 0.05), while the blue area shows all the remainder.Patterns were similar whether α was calculated for all authors, for first authors only, or for last authors only.

Figure 2
Figure 2 illustrates the variance in journal homophily values (α ) across scientific disciplines.

Figure 3 :
Figure 3:The coefficient of homophily (α ) was slightly less positive when calculated for two-author papers only, relative to papers with longer author lists.The individual points, whose distribution is summarised by the violin plots, correspond to individual journals.The larger white points show the mean for each group (and its 95% CIs), as calculated by a Bayesian meta-regression model accounting for repeated measures of α within journals, as well as the precision with which α was estimated.

Figure 5 :
Figure5: Journal impact factor (expressed relative to the average for the discipline) is negatively correlated with α .The relationship is noisy (R 2 = 0.043), but the results suggest that journals with strong homophily tend to have lower impact factors than journals with weak homophily in the same discipline.
|α null | > |α obs |.We applied false discovery rate (FDR) correction to each set of p-values to account for multiple testing[72].As expected, α was usually almost identical to α (S7 Fig), but α was downwardly biased relative to α for small datasets (S8 Fig).Additionally, the correlation between α and sample size was negligible (R 2 < 0.01), suggesting that our calculation of α effectively removed the dependence of α on sample size.We therefore used α in all analyses.