Comment on ‘Are physicists afraid of mathematics?’

In 2012, we showed that the citation count for articles in ecology and evolutionary biology declines with increasing density of equations. Kollmer et al (2015 New J. Phys. 17 013036) claim this effect is an artefact of the manner in which we plotted the data. They also present citation data from Physical Review Letters and argue, based on graphs, that citation counts are unrelated to equation density. Here we show that both claims are misguided. We identified the effects in biology not by visual means, but using the most appropriate statistical analysis. Since Kollmer et al did not carry out any statistical analysis, they cannot draw reliable inferences about the citation patterns in physics. We show that when statistically analysed their data actually do provide evidence that in physics, as in biology, citation counts are lower for articles with a high density of equations. This indicates that a negative relationship between equation density and citations may extend across the breadth of the sciences, even those in which researchers are well accustomed to mathematical descriptions of natural phenomena. We restate our assessment that this is a genuine problem and discuss what we think should be done about it.


Mathematics plays a vital role in the sciences
Mathematical theory is an indispensable part of scientific research, capturing the essence of fundamental physical, chemical and biological processes with greater clarity, precision, rigor, and brevity than verbal arguments can achieve. In a range of disciplines, efficient dialogue between theoretical developments and empirical testing is critical to driving science forwards [1][2][3][4][5][6][7][8]. Reports of a barrier to communication between theoretical and empirical research [9][10][11] should therefore arouse concern.
In a recent study [12], we showed that the citation counts of articles in ecology and evolutionary biology are negatively associated with the density of mathematical equations presented in the main text. In a paper published in the New Journal of Physics, Kollmer et al [13]-hereafter KPG-attacked our interpretation of the citation patterns in ecology and evolutionary biology, criticized our methodology and argued that there is no evidence for a negative impact of high equation density on citation counts. Responding to our call for similar analyses in other fields [14], KPG also investigated the relationship between equation density and citation count in a leading multidisciplinary physics journal, Physical Review Letters. As they did for the biology data, they argued that there was no evidence for a relationship. Here we show that the conclusions drawn by KPG are incorrect.

Our original analysis is objective and valid
KPG present our data in alternative graphical formats (their figure 1) and suggest that the effect we found is simply an artefact of how we binned the data. This is incorrect: our conclusions were based on a formal statistical analysis that did not involve any binning of data, but instead treated equation density as a continuous covariate. This statistical analysis-a generalized linear model (GLM) with a negative binomial error function-is appropriate for count data that are extremely over-dispersed (in this case, the most cited papers are much more Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. highly cited than expected from a Poisson distribution). We used binning merely to illustrate the patterns graphically, as a visual aid to readers' intuition. This binning was entirely separate from our statistical analysis, so the claim that this 'strongly influences the final outcome of the analysis' (p 3) is unfounded.
KPG argue that 'the citation data is so noisy that its [sic.] not reliable for identifying unambiguous trends capable of predicting citation success' (p 7). Yet scientists in many fields routinely analyse noisy data using statistical modelling. In this case, the noisiness of the patterns implies that citation rates are, reassuringly, more strongly influenced by a combination of factors other than equation density, which alone explains a relatively small proportion of the variance-a point we emphasised in our original paper [12]. Despite incorporating all of this noisiness, our statistical models consistently revealed a significant (and sizeable) negative effect of equation density on citation count.
KPG claim that our 'original presentation of the data K disregarded completely the fact that by far the two most cited papers K have very high equation densities and high citation counts coming from non-theoretical papers' (p 3). This is incorrect: both our graphical representations and the formal statistical analysis included all data points, regardless of whether they were 'outliers'. Statistically, nothing can be reliably inferred from two examples of heavily cited, equation-dense outliers selected from a sample of 649 articles. These two outliers have exceptional citation counts because of factors other than equation density (e.g. exceptional scientific importance); counts that may have been even higher if the papers were less equation-dense.
Finally, KPG conclude that our findings are 'strongly dependent on their artificial subdivision of papers into theoretical and non-theoretical work' (p 6). This is not true because the effect is still present when analysing the entire sample (22% drop in citations for each additional equation per page), without any subdivision into theoretical and non-theoretical papers.

Citations in physics show a negative effect of equation density
KPG presented data on the number of equations and citations for the set of papers published in volumes 94 and 104 of Physical Review Letters. They reported no relationship between equation density and number of citations but they did not carry out a formal statistical analysis, so their conclusions are debatable: visual comparison of binned data is a subjective and unreliable way to infer statistical trends in continuous data. In two of their figures (figures 2(a) and (d)) KPG noted a slight decrease in citations for more equation-dense papers, but stated that this 'decrease is not significant because it is well within the large error bars' (p 5). However, drawing statistical conclusions based on error bars is fraught with difficulty [15], particularly when the error bars are descriptive (e.g. standard deviation, as used by KPG) rather than inferential (e.g. standard error of the mean, SEM, as used by [12]) [16]. The degree of overlap between error bars may be a useful guide when exploring data to assess whether group means are more different than would be expected by chance [16], provided that one divides the data into only a few groups. As continuous data is binned into more and more groups, error bars will tend to become larger simply because the sample each is based on gets smaller. The size of error bars for small bins will be 'sometimes dominated by a single high-impact paper' (KPG, p 5), and thus the error bars become less informative of overall trends.
To assess the effect of equation density objectively we carried out the same type of statistical analysis as in our original paper, using the data set presented by KPG (for details see doi: 10.5281/zenodo.58792). This formal and robust analysis shows that equation density has a statistically significant negative effect on the number of citations, leading on average to 6% fewer citations for each additional equation per page. This effect increases to 8% for papers that have been cited fewer than 100 times each.

Our viewpoint has been misinterpreted
KPG state that they were motivated to write their article by our 'surprising attempt of blaming math K for a lack of success in getting citations' (p 2). This was certainly not our intention: both of us regularly publish papers describing mathematical and computational models, so we sincerely hope that a low citation rate for mathematical work is not inevitable. Rather, we suggested that an immediate, pragmatic solution to this apparent problem would be to reduce the density of equations and add explanatory text for non-specialised readers. We have repeatedly emphasized in our papers on this topic [12,14] that it is equation density, not the number of equations, that is associated with citation counts. Thus, we recommend that the authors of theoretical work use more text to explain their equations clearly.
Our suggestions for how the negative impact of equation density can be remedied are unchanged: essential equations capturing the assumptions and structure of a model should be presented in the main text, whereas non-essential equations, such as those describing intermediate steps to solutions, need only be given in the appendices. This would have the effect of reducing equation density, and allow more space to explain the assumptions and implications of the work underlying the essential equations. Making such adjustments to presentation could potentially have strong beneficial effects on citation patterns. For instance, our statistical model predicts that, all else being equal, the 45 articles in KPG's data that are moderately well cited (50-100 citations) and equation dense (2 equations per page) would have attracted an additional 476 citations (17% of their total) if the authors had halved the density of equations in the main text.
The finding that the negative effect of equation density is also present in a data set from physics suggests that the phenomenon is not restricted to the life sciences, but extends to fields with a traditionally greater reliance on mathematics. The effect we found was considerably weaker than in a sample of papers from ecology and evolutionary biology [12], but still sizeable and statistically significant. This suggests that the problem may be even more widespread than we originally thought, perhaps affecting all disciplines that rely on mathematics to understand natural phenomena.
Ideally, the impact of scientific work should be determined by its scientific merit, rather than by presentational style. Unfortunately, it is clear that scientifically strong papers may have reduced impact if not presented in an accessible manner. We reiterate our view that all scientists aiming to communicate theory in the most effective way should take this issue seriously, rather than claiming it does not exist.