The advantage of short paper titles

Vast numbers of scientific articles are published each year, some of which attract considerable attention, and some of which go almost unnoticed. Here, we investigate whether any of this variance can be explained by a simple metric of one aspect of the paper's presentation: the length of its title. Our analysis provides evidence that journals which publish papers with shorter titles receive more citations per paper. These results are consistent with the intriguing hypothesis that papers with shorter titles may be easier to understand, and hence attract more citations.

Scientific endeavours also generate extensive written communication, in the form of papers. We define a paper to be more successful than others if it has received a greater number of citations. The online database Scopus contains citation records of papers, offering remarkable insights into academic conversation. Recently, advances have been made in quantifying scientific output based on publication statistics [24][25][26][27][28]. A number of studies have provided evidence that the long-term success of scientists depends on their early publications [29,30]. Further analyses have indicated that a paper's success can be partially predicted by its early success [31][32][33] as well as the reputation of the authors [34]. In addition, papers in particular academic domains gain more citations than others [35].
Here, we consider whether we can find any evidence that the style in which a paper is written may relate to its success.  characteristics of scientific paper titles [36][37][38][39][40][41]. A subset of these studies have focused on identifying stylistic attributes of academic writing and the use of a colon or question in a paper's title [36][37][38][39]. Those which have investigated the relationship between the length of an article's title and the number of citations it receives have been limited to relatively small samples, up to a maximum of 2200 papers [40,41]. These analyses have reported conflicting results, with one study suggesting that papers with longer titles might receive more citations [41] and another finding no evidence of a relationship [40]. Here, we exploit data on a much larger sample of 140 000 papers in order to investigate whether a paper's title length bears any relation to the number of citations it receives.

Results
We analyse data provided by Scopus, one of the leading bibliometric platforms. A Scopus user can search and export data on journal articles in batches of 20 000 records, including data on how often each article has been cited since publication. We download data on the 20 000 most cited papers in each year between 2007 and 2013.
We determine the number of characters in each paper's title, including spaces and punctuation. Using the year 2010 as an example, we rank the papers' title length and citations (figure 1a). Upon visual inspection, there appears to be a high concentration of papers with short titles and many citations, as well as a high concentration of papers with long titles and few citations. We find that for the top 20 000 most highly cited papers published in 2010, papers with shorter titles receive more citations (Kendall's τ = −0.07, N = 15 395, p < 0.001). We apply the same analysis to each year in our sample and find that papers from all years exhibit this relationship between their title length and citations (figure 1b; all τ s < −0.042, all ps < 0.001, α = 0.05, Kendall's τ correlation with false discovery rate (FDR) correction).
Some journals may attract a greater number of citations for their papers owing to their reputation. To remove any potential influence of the journal in which a paper is published on the relationship between citations received and paper title length, we rank all of the papers in terms of the number of citations received and transform these ranks into percentiles. We calculate percentiles in terms of the length of papers' titles in the same fashion. In this transformed dataset, for papers published in 2010, we find that papers with shorter titles receive more citations . These smaller τ s suggest that the journal in which a paper is published may help explain the relationship between paper title length and the number of citations the paper receives.
To investigate this hypothesis further, we group papers by their journal. Again, using 2010 as an example, we calculate the median number of citations and median title length for each journal. We find that journals which published papers with shorter titles also tend to receive more citations per paper all τ s≤ −0.14, all ps < 0.001, α = 0.05; Kendall's τ correlation with FDR correction). Finally, we carry out a complementary aggregated analysis across all years of data in our sample. We rank all papers published in a given year by citations received and by title length, and transform these ranks into percentiles for that year. Again, we find that journals which publish papers with shorter titles also tend to receive more citations per paper (figure 3; τ = −0.19, N = 625, p < 0.001, Kendall's τ correlation).
Our primary analysis is based on rank-based statistics. To complement our analysis, we fit a mixedeffects model to the log of the number of citations a paper receives as a function of its title length controlling for the journal in which each paper is published. A mixed-effects models allows us to control for the journal in which each paper is published. We define our model as log 10 (c j,p ) = I + I j + (L + L j )l j,p + j,p, (2.1) where c j,p is the number of citations received by paper p published in journal j. The distribution of citations received by a paper is highly positively skewed. For this reason, we log these citation counts, so that the distribution of the residuals of our model, , is closer to a Gaussian distribution. The grand intercept is I, whereas I j is an intercept for each journal. There is a fixed slope L for the number of characters in the title l j,p for paper p published in journal j. There is also a journal-level random effects   Table 1. Mixed effects model of the relationship between paper title length and citations received. Our primary analysis in figures 1 and 2 are based on rank statistics. To complement this analysis, we fit linear models to the data. We fit a mixed-effects model to the log of the number of citations a paper receives as a function of its title length (equation (2.1)). The model includes a fixed slope L for the number of characters in the length of a paper's title. We fit this model for each year in our dataset and display the slopes here under 'for individual papers' . We find that, for each year, the slope is negative. We also investigate if this relationship exists when aggregating papers by the journal in which they are published. We fit a linear regression model to the log of the median number of citations papers receive per journal as a function of the median title length (equation (2.2)). There is a slope L for the median number of characters in the titles of papers published in each journal. We fit this model for each year in our dataset and display the slopes here under 'for individual journals' . Again, we find that for each year, the slope is negative. Asterisks represent FDR-corrected p-values for a t-test of the slope. * p < 0.05, * * p < 0.01, * * * p < 0.001.
Gaussian distribution. We fit the model for each year. We find that journals which publish papers with shorter titles also tend to receive more citations per paper (

Discussion
In this study, we investigate whether the length of a scientific paper's title is related to the number of citations it receives. We analyse the 20 000 most highly cited papers for  For each year in our dataset, we rank all of the papers in terms of the number of citations received and in terms of the length of the titles, and transform these ranks into percentiles for a given year. For each journal, we then calculate the average quantile of the citations and of the title lengths, across papers and across years. Here, each blue circle represents a journal, the size of each circle represents the number of papers in our sample for that journal. Again, we find that journals that publish papers with shorter titles also tend to receive more citations per paper (Kendall's τ = −0.19, N = 625, p < 0.001).
length of a paper's title bears no relation to its scientific impact [40], or that longer titles can be linked to greater citation counts [41]. Our analysis suggests that papers with shorter titles do receive greater numbers of citations. However, it is well known that papers published in certain journals attract more citations than papers published in others. When citation counts are adjusted for the journal in which the paper is published, we find that the strength of the evidence for the relationship between title length and citations received is reduced. Our results do however reveal that journals which publish papers with shorter titles tend to receive more citations per paper.
We propose three possible explanations for these results. One potential explanation is that highimpact journals might restrict the length of their papers' titles. Similarly, incremental research might be published under longer titles in less prestigious journals. A third possible explanation is that shorter titles may be easier to understand, enabling wider readership and increasing the influence of a paper.
Our findings provide evidence that elements of the style in which a paper is written may relate to the number of times it is cited. Future analysis will investigate whether further stylistic attributes of the language used in a paper can be related to the number of citations it receives.

Methods
We retrieve bibliometric data from Scopus Some journals are referred to with multiple variations of their name (for example, 'Analyst' and 'The Analyst'). For this reason, we clean the dataset from Scopus by deleting leading 'The's from each journal's title, and converting the title to lower case. We also identify all journals which have fewer than 10 papers in the most cited 20 000 papers for a given year, and remove the papers in such journals for that year. The basic characteristics of our dataset before and after cleaning are depicted in the electronic supplementary material, figure S1.
Data accessibility. Datasets used in this study are available via the Dryad Repository (doi:10.5061/dryad.hg3j0). Authors' contributions. A.L., H.S.M. and T.P. performed analyses, discussed the results and contributed to the text of the manuscript.