Inferring the causal effect of journals on citations

Articles in high-impact journals are by definition more highly cited on average. But are they cited more often because the articles are somehow"better"? Or are they cited more often simply because they appeared in a high-impact journal? Although some evidence suggests the latter the causal relationship is not clear. We here compare citations of published journal articles to citations of their preprint versions to uncover the causal mechanism. We build on an earlier model to infer the causal effect of journals on citations. We find evidence for both effects. We show that high-impact journals seem to select articles that tend to attract more citations. At the same time, we find that high-impact journals augment the citation rate of published articles. Our results yield a deeper understanding of the role of journals in the research system. The use of journal metrics in research evaluation has been increasingly criticised in recent years and article-level citations are sometimes suggested as an alternative. Our results show that removing impact factors from evaluation does not negate the influence of journals. This insight has important implications for changing practices of research evaluation.

Articles in high-impact journals are by definition more highly cited on average. But are they cited more often because the articles are somehow "better"? Or are they cited more often simply because they appeared in a high-impact journal? Although some evidence suggests the latter 1-3 the causal relationship is not clear. We here compare citations of published journal articles to citations of their preprint versions to uncover the causal mechanism. We build on an earlier model 4 to infer the causal effect of journals on citations. We find evidence for both effects. We show that high-impact journals seem to select articles that tend to attract more citations. At the same time, we find that highimpact journals augment the citation rate of published articles. Our results yield a deeper understanding of the role of journals in the research system. The use of journal metrics in research evaluation has been increasingly criticised in recent years 5 and article-level citations are sometimes suggested as an alternative. Our results show that removing impact factors from evaluation does not negate the influence of journals. This insight has important implications for changing practices of research evaluation.
The journal impact factor has been criticised on several accounts. 6 The main critique is its pervasive use in the context of research evaluation, for example in tenure decisions. 7 Scientists also shape their research with impact factors in mind. 8,9 Some even speak of impact factor mania. 10 In a meeting in San Francisco, cell biologist called for a ban on the impact factor from research evaluation, and conjoined the "San Francisco Declaration on Research Assessment" 11 (DORA). A group of researchers and editors called for publishing entire citation distributions instead of impact factors, to counter inappropriate use. 12 More recently, a group of editors and researchers came together and called for "rethinking impact factors". 13 At the same time, the impact factor of a journal is one of the most clear predictors of future citations. [14][15][16][17] The question is why. Possibly, journals select articles of a high "quality", which then go on to be cited frequently. In this case, journal metrics are possibly a more accurate indicator of "quality" than noisy individual citations. 18 Another possibility is that journals do not select articles of a high "quality", but articles are simply cited more frequently because they are published in a high-impact journal.
Answering this question is not straightforward. It requires an independent measurement of "quality", but postpublications reviews are likely to be affected by the journal. In rare cases, publications are published in multiple journals, a) Electronic mail: v.a.traag@cwts.leidenuniv.nl J C φ C A T Figure 1. Simple causal model of the confounding effect of the latent citation rate φ of an article being published in a journal J and the citations it accrues C. In contrast, citations to preprints C are affected by the latent citation rate φ only. This selection bias on arXiv preprints A does not bias the causal effect of J on C once φ is controlled for. and researchers found that the version in a higher impact journal was more frequently cited than its twin in a lower impact journal. [1][2][3] However, duplicate publications are quite special, limiting the generalisability of this observation. If a duplicate publication had been published in a single journal it would perhaps simply have accumulated all citations that are now scattered across several journals. Some other earlier work claimed that citations were not affected by the journal. 19 We answer this question by comparing citations to preprints with citations to the published version. The number of citations C may be influenced by both the latent citation rate φ and the journal J in which it is published (Fig. 1). Possibly, high-impact journals perform a stringent peer review of articles, selecting only articles with a high latent citation rate, so that φ influence the journal J. The latent citation rate itself may be influenced by many factors and characteristics of the paper 20 and motivations for citing the paper. 21 Crucially, the number of citations to the preprint before it is published C is unaffected by where it will be published and is affected only by the latent citation rate φ . We rely on this insight to estimate the causal effect of the journal on citations Pr(C | do(J)). The identification of the causal effect is possible because of the so-called "effect restoration", 22 provided we can estimate Pr(C | φ ). We construct a parametric model that provides exactly such an estimate.
We gathered information about 1 341 016 preprints from arXiv, and identified the published version for 727 186 preprints (54%). We extracted citations to both the preprint version (using arXiv identifiers) and the published version from references in Scopus. Preprint dates, publication dates and citation dates are all extracted from CrossRef, using a daily granularity. We used the major subject headings of arXiv as field definitions. The impact of journals is calculated as the average number of citations received in the first five years after publication for all research articles and reviews in Scopus. We perform our analysis per year (2000-2016) and field, and restrict to journals that have at least 20 articles that were published at least 30 days after appearing as a preprint on arXiv (Fig. B.1).
There is a clear selection bias 23 on papers being submitted to the arXiv or not (A). We assume that the latent citation rate φ may affect whether a paper will be submitted to the arXiv A, which in turn may affect the journal J. Previous research showed that publications that are available as preprints are more highly cited, 24,25 but this effect seemed unlikely to be causal. 26,27 Whether a paper is posted on the arXiv then does not directly influence the citations C. If we control for φ (which is effectively done by controlling for C ), we obtain that Pr(C | do(J), A = 1, φ ) = Pr(C | do(J), φ ) by the rules of do-calculus. 28 We thus obtain an unbiased estimate of the causal effect Pr(C | do(J)), even if our observations are biased towards arXiv papers.
Time complicates our analysis. The time T before a preprint was published, the preprint duration, will clearly affect the number of pre-publication citations C , while the total time since publication T will affect the post-publication citations C. Preprints with a higher latent citation rate may perhaps be more quickly published, thus affecting T . To tackle this problem, we model the temporal dynamics of citations, both pre-and post-publication.

MODEL
Citation dynamics are influenced by a wide range of factors, such as a rich-get-richer effect and a clear temporal decay, 29 but was captured reasonably well by a recent model. 4 We build on that model 4 and include a parameter that modulates the citation rate based on where the article was published. We assume that the number of citations c i (t) article i receives at time t is Poisson distributed as with effective citation rate λ i (t) and C i (t) = ∑ t τ=0 c i (t) the cumulative number of citations. The temporal decay of the accumulation of citations is captured by f i (t), which is modelled by an exponential distribution, with inverse rate β i . We assume that preprint i attracts citations at a rate of φ i , where φ i is the latent citation rate of article i. The published version attracts citations at an effective rate of φ i θ J i , where θ J i is the journal citation multiplier for journal J i in which article i is published. We equate θ j with the causal effect on citations of publishing in journal j. We call C i = C i (T i ) the pre-publication citations and C i = C i (T i ) − C i (T i ) the postpublication citations. The expected number of long-term citations is about m(e φ i θ J i − 1), assuming pre-publication citations are negligible.
The selection of articles by peer review is assumed to lead to a distribution of latent citation rates for journal j, (2) Figure 2. Illustration of citation dynamics. This example, astro-ph/0405353, was first submitted to the arXiv in 2004 and was published in Journal of Cosmology and Astroparticle Physics almost four years later (T i = 1 385). It was cited 33 times before it was published (C i = 33), and 29 times after it was published (C i = 29). We assume citations are attracted at a rate of φ i before it was published and at a rate of φ i θ J i after it was published. The thick line represents the empirically observed number of citations. The thin shaded lines represent samples from the posterior predictive distribution.
If Φ j is high, journal j will tend to publish articles of higher latent citation rates φ i . The median latent citation rate of journal j is e Φ j . Effectively, this is a Bayesian hierarchical model, and we specify informed prior distributions based on earlier results 4 (see Appendix A 2 for details).

RESULTS
The number of pre-and post-publication citations are not clearly related (Fig. 3A). The number of pre-publication citations also do not clearly relate to journal impact, showing some curvilinearity (Fig. 3B). The relation between preprint duration and the number of pre-publication citations is also not clear and seems curvilinear (Fig. 3C). This can possibly be explained by two counteracting effects: higher latent citation rates lead to higher pre-publication citations, but also to shorter preprint durations, reducing the time to attract prepublication citations. The ratio of post-publication citations and pre-publication citations is higher for high-impact journals (Fig. 3D). Articles in high-impact journals accumulate more post-publication citations relative to pre-publication citations compared to articles that have appeared in lower impact journals. This result is confounded by the preprint duration, so we cannot draw hard conclusion from it. The model that we constructed is intended to address this issue.
We here report results from our model for entire journals (detailed results per field and year are available in  post-publication citations (Fig. B.6).
The journal citation multiplier is almost always higher than 1 (Fig. 4A). Publishing in journals, compared to only being available on arXiv, multiplies the citation rate substantially. For example, Nature shows a multiplier of about 5, meaning that a Nature article that obtained about 200 citations, would have obtained about 15 citations had it been available on the arXiv only (assuming it had no pre-publication citations). Higher impact journals clearly show higher citation multipliers. Of course, there are some clear differences. For example, Reviews of Modern Physics shows a citation multiplier of about 11 and a journal impact of 241. Physics Reports on the other hand, shows a citation multiplier of only 1.3, whereas it has a journal impact of 76. Work that appeared in Reviews of Modern Physics would have drawn subtantially fewer citations if it had been published in Physics Reports.
The median latent citation rate e Φ j is clearly increasing with journal impact (Fig. 4B). Physics Reports for example shows a median latent citation rate of about 0.92, while Reviews of Modern Physics shows a median latent citation rate of about 0.37. The difference between these two review journals is likely to emerge from a difference in submission policies: Physics Reports only accepts invited reviews for submission, whereas Reviews of Modern Physics is open to submissions from anyone. In contrast to these two review journals, the US based Physical Review Letters publishes short letters, and shows a latent citation rate of about 0.18. Its lower impact European counterpart Europhysics Letters shows a latent citation rate of about 0.082. The median effective citation rate of a journal is e Φ j θ j , which aligns closely with the observed journal impact (Fig. B.3).
Of course, the latent citation rates also vary within journals, which is controlled by ε j . Journals with a higher ε j tend to publish articles with a larger variety of latent citation rates. For example, Physical Review E shows a ε j of about 0.8, while Science shows a ε j of about 0.3, resulting in a broader distribution of φ i for Physical Review E than Science. Interestingly, high-impact journals show more narrow distributions of latent citation rates than lower impact journals (Fig. 4C).

DISCUSSION
Why articles in high-impact journals attract more citations is a fundamental question. We provided clear evidence that highimpact journals are highly cited because of two effects. On the one hand, articles that attract more citations are more likely to be published in high-impact journals. On the other hand, articles in high-impact journals will be cited even more frequently because of the publication venue. This amplifies the cumulative advantage effect for citations. 30 Our results of course hinge on the extent to which our model and assumptions are realistic. Although we believe that the model is quite reasonable, and fits the observations reasonably well, others may disagree with some of our assumptions. This is a natural state of affairs, and should be welcomed. Progress can only be made through discussion among scientists, and we hope to learn with every step we take. A very recent publication 31 took a similar approach and reached similar conclusions, corroborating our results.
Several mechanism may play a role in the causal effect of journals on citations. High-impact journals tend to have a higher circulation, 32 and reach a wider audience. In addition, it is possible that researchers prefer to cite an article from a high-impact journal over an article from a low-impact journal, even if both articles would be equally fitting. Both mechanisms are consistent with our results and earlier results. [1][2][3] Without further data we cannot distinguish between these two mechanisms. 33 An alternative explanation may be that published preprints are more highly cited because the preprints were improved by high-quality peer review in high-impact journals. We find this an unlikely scenario. Differences between the preprint and the published version are textually relatively minor. 34 Those changes can of course be substantively important. Peer review may substantially improve and strengthen a manuscript. Nonetheless, we think it is unlikely to alter a paper so extensively that it changes the core contribution of a paper so as to affect its citation rate.
Our analysis is limited to mostly physics and mathematics because of our reliance on the arXiv. We expect to see similar effects in the medical sciences or the social sciences.  It would be good to replicate our analysis on other preprint repositories, such as bioRxiv or SocArxiv. Another limitation is that we only considered references from published articles. It would be interesting to also include the references of preprints. Including them is likely to increase the number of pre-publication citations, 24 which may decrease the inferred journal causal effect.
The latent citation rate itself may be influenced by many factors and characteristics of the paper 20 and motivations for citing the paper. 21 Overall, our results suggest that paper characteristics (X 1 , X 2 , . . . ) that drive citations (C) overlap to some extent with factors that drive journal (J) peer review (Fig. 5). For example, novelty, relevance and scientific breadth may perhaps affect both journal evaluation and citations directly, while methodological aspects perhaps only affects journal evaluation and authors' reputation only affects citations. However, because the journal also affects citations, methodological aspects would have an indirect effect on citations in this example. What factors drive journal evaluation and what factors drive citations is not clear and should be further investigated.
We conjecture that a subset of factors that are used in journal evaluation are also used in post-publication research evaluation. This means that research evaluation (E) tends to correlate with journals because of underlying common factors. Even if factors that influence research evaluation would not influence citations directly, they would still correlate because of the mediating effect of the journal. If this would be the case, the correlation between evaluation outcomes and citations should be reduced when controlled for the journal. Previous research provides some support for this. 35 In this holds true, citations would be indicative of evaluation outcomes only because they were published in a particular journal. The journal itself might even be a more appropriate indicator. Possibly, evaluation itself is also affected directly by the journal in which it is published. Depending on the context, evaluation may also be affected directly by citations. Indeed, the pro-posed causal diagram in Fig. 5 only captures part of a larger web of entanglement.
Our results affirm nor refute the argument by Waltman and Traag 18 about the use of journel metrics for research evaluation. They argue that stringent high-quality peer review by journals could lead to a homogeneous distribution of "value", and that citation rates could be a more noisy indicator of "value". If that were the case, journals metrics would be a better indicator of "value" than article-level citations Whether latent citation rates as defined here reflect this "value" is up for debate. Incorporating the effect of journals on citations in the model of Waltman and Traag 18 does not refute their argument either. In fact, it might even strengthen their argument. The question whether journals are a more accurate indicator than article-level citations thus remains open.
The use of citations and journals in research evaluation is often debated. Removing the use of journal metrics from research evaluation, as for example advocated by DORA, may decrease the pressure on authors to publish in high-impact journals. The use of article-level citations for evaluation could be condoned by DORA, but the use of journal metrics could not. Ironically, article-level citations may be informative precisely because they are influenced by where the research is published. Even if journal metrics were to be removed from Evaluation Citation E J C   Appendix A: Methods

Data
We combined data from arXiv, CrossRef and Scopus to establish our dataset. All data is made available for replication 37 , and source code is available from https://github.com/vtraag/journal-causal-effect-replication.

a. arXiv
We downloaded data from a bulk export from arXiv from https://archive.org/download/arxiv-bulk-metadata and used the file arxiv biblio oai dc.2018-01-19.xml.
For all arXiv XML elements in the data we extracted the arXiv identifier, and if present the DOI. We also extracted the date the preprint was first posted on arXiv. In total, this dataset covered 1 341 016 preprints, and a DOI is provided for 727 186 preprints (54%).
We extracted the subject for each arXiv preprint. The subjects were quite noisy, and did not contain only the subject division of arXiv, but also other subject classifications, most notably, the Mathematical Subject Classification (MSC). The arXiv subject classifications were provided as "Major -Minor" subjects, although sometimes only a major subject was provided. We extracted the major part and assigned an arXiv preprint to a major subject if that subject is at least used by 1 000 preprints (and is not an MSC). We thus retain the 18 major subjects as listed in Table A.1 Preprints can be assigned to multiple major subjects, as shown in Table A The large majority of arXiv preprints is assigned to a single major subject (80%). A single preprint has been assigned to as many as 8 different major subjects (1108.2700). There are only 261 preprints that have not been assigned to any of the major subjects. These are papers that are published in economics (33) and electrical engineering (228), subjects which were introduced in September 2017, and in which arXiv did not yet have many preprints at the time of data collection.

b. CrossRef
We established the publication date using CrossRef, which is available in-house at CWTS. We used the CrossRef database that was imported on August 2018. We determined the publication date as the first date of the following dates from CrossRef: "published online", "published print", "created" and "issued. We established the publication date for all arXiv preprints. Out of the 727 186 provided DOIs in arXiv, we find a match in CrossRef for 722 003 articles (99%). We established the publication date for all citing publications using CrossRef in the same way. See section A 1 c for more details concerning the citing publications.

c. Scopus
The Scopus database is available in-house at CWTS, which we used for our analysis. We relied on the Scopus database that was imported on May 2018.
We used Scopus to find the published version of the preprint. This was done by matching the DOI from arXiv with the DOI as recorded in Scopus. Out of the 722 003 DOIs from arXiv that were matched to CrossRef, we found 664 741 DOIs from Scopus with a unique match (92%). We used the matched publication in Scopus to identify the journal in which the preprint was published.
We calculated the impact of journals using Scopus. We defined the impact as the average number of citations received in the first five years after publication for all articles (document type ar) and reviews (document type re). For articles that were published within five years of the end of the database (2018), we counted citations until the end of the database.
Finally, we used Scopus to identify citations of both the preprint version and the published version. We parsed all raw cited reference strings provided in Scopus to extract an arXiv identifier or a DOI. We identified arXiv identifiers in the reference string using the regular expression If the reference was matched by Scopus, and a cited publication was identified, we used the DOI from the cited publication as recorded in Scopus. If that was not available, we used the DOI in the reference string extracted using the regular expression For all citing documents, we extracted the publication date through Crossref, as described in section A 1 b. We used this date as the cited date of the cited document. The cited date is used at the resolution of a day. Citations that were made on or before the publication date of the preprint are called pre-publication citations, and citations that were made after the publication date are called post-publication citations. In total we identified 156 528 pre-publication citations and 15 939 887 post-publication citations from references in Scopus.

Model and Bayesian inference
The full specification of the hierarchical Bayesian model introduced in the main text is as follows. As already introduced in the main text, we model the probability of attracting c i (t) citations at time t as where T i is the date at which publication i is published, and t = 0 is the time at which the preprint was posted on arXiv. We are modelling citations at a daily rate, and it is reasonable to assume that citations on the same day have not influenced each other. Citations on the same day can be regarded as independent events. The Poisson distribution models exactly a random variable that counts the number of evens that happen at a given rate within a given interval, making it a suitable distribution for c i (t). This is a slight generalisation from Wang, Song, and Barabási, 4 who only consider the probability of being cited at a certain time t. In practice, publications may attract multiple citations at a single day, and we therefore consider the number of citations explicitly. This happens only infrequently, as only about 6% of the days at which a publication is cited is it cited more than once in our dataset. The temporal decay is represented by f i (t), which follows the density of an exponential distribution For the temporal decay we assume a prior of See Fig. A.1 for a visualization of the prior for the temporal decay. Our prior expectation is that the decay takes about 3 years, which corresponds roughly to the results found by Wang, Song, and Barabási.Wang, Song, and Barabási 4 This agrees also with other literature on the decay of citations. [38][39][40] Note that we do not use the log-normal distribution for the decay, as used by Wang, Song, and Barabási.Wang, Song, and Barabási 4 Modelling the decay using the log-normal distribution resulted in problem of convergence, which seemed to be due to multimodality of the logarithmic decay, problematising model identifiability. Using a maximum likelihood approach as used by Wang, Song, and Barabási 4 may miss this multimodality. Using an exponential decay improved the convergence of the Bayesian sampling. Note that even an exponential decay can lead to an initial increase of the number of citations and later decrease, as is typical of citations. We show this in section A 2 a. There is a certain degeneracy in the model for pre-publication citations that depends on our assumptions of the prior for the decay. If we observe few pre-publication citations, this can be due to two factors: a low decay f i (t) at that point t, or a low φ i . It is therefore important to assume reasonable priors for the temporal decay. If we assumed that f i (t) would be mostly concentrated in the first few days, we would erroneously infer a too low φ i and a too high θ J i . Although an exponential decay by definition only decreases, our prior expectation is that the decay is quite gradual. The prior on β i is also quite broad, allowing for substantially different decay.
We assume that the latent citation rate of articles published in a certain journal j is distributed as We assume priors of which roughly corresponds to distributions of λ i as found in Wang, Song, and Barabási 4 for various journals, assuming the journal citation multiplier is about 1. Although φ i is modelled hierarchically as an element of a journal, causally speaking, φ determines J, not the other way around. That is, there is certain causal effect Pr(J | do(φ )), which we assume to give rise to the probability Pr(φ |J) we model here. See Fig. A.2 for a visualization of the priors for the latent citation rates. The use of priors in fitting this type of models is also employed by Wang et al. 41 in a response to the critique by Wang, Mei, and Hicks 42 . Finally, we assume the following prior on the journal citation multiplier θ j  which is centered around 1. See Fig. A.3 for a visualization of this prior. The larger citation rates observed for high-impact journals may correspond to either a higher Φ j or a higher θ j . Our priors are relatively conservative with respect to a journal causal effect. We have assumed a prior on Φ j that corresponds to overall distribution citation rates as found by Wang, Song, and Barabási 4 . The prior on θ j is centered around 1, corresponding to no journal causal effect, but still allows for larger θ j .
We use pystan 2.19.0 to perform Bayesian inference of the posterior distributions using the no-U-turn sampler. 43 In practice, citations are relatively sparsely distributed throughout time and c i (t) = 0 for most t. Instead of specifying the probability for each t separately, we can more efficiently specify the probability for only those t for which c i (t) > 0. The probability of observing 0 citations for a duration of τ is identical to an exponential distribution with the same rate as the Poisson distribution in Eq. A1. More specifically, for a t 1 and t 2 such that c i (t 1 ) > 0 and c i (t 2 ) > 0, the probability of observing 0 citations for all t between t 1 and t 2 then equals assuming times t 1 and t 2 do not cross the publication date T i . In they do cross T i , the time windows (t 1 , T i ] and (T i ,t 2 ) should be considered separately. To improve the numerical stability of pystan, we use a logarithmic specification of the rate for the Poisson distribution. This also necessitates to work with the logarithm of the temporal decay, which has a simple form. Finally, we use four chains of 1 000 iterations each, using half of the iterations for warmup with a target acceptance rate of 0.98 (adapt delta) and a maximum tree depth of 20.
We perform our analysis per year (2000-2016) and field, and restrict to journals that have at least 20 articles that were published at least 30 days after being posted as a preprint on arXiv (Fig. B.1). This results in 3 892 different subsets that are separately fitted. The different subsets cover 258 different journals. There were seven subsets which yielded diverging transitions. Only one subset showed large problems, and almost 25% of the transitions diverged. The remaining six subsets only showed three diverging transitions at most. Nonetheless, we excluded all subsets that showed diverging transitions, but results are unaffected by the exclusion or inclusion of these seven problematic subsets. Using log-normal temporal decay resulted in diverging transitions for about two-third of the subsets.

a. Analysis
We first analyse the mean number of citations attracted by article i. We can write the total number of citations C i as C i (t) = C i (t − 1) + c i (t) for t > 0 with C i (0) = c i (0). Taking the expected value then yields E(C i (t)) = E(C i (t − 1)) + E(c i (t)). (A12) Writing out the expected number of citations received at time t yields so that we end up with the recursion This recursion has as a solution which can be easily checked by substituting in Eq. (A13): Writing the product as an exponential sum of logarithms we obtain A simple Taylor expansion shows that log(1 + x) ≈ x for small x, so that we obtain the approximation The expected number of pre-publication citations is given by E(C i ) = E(C i (T i )) while the expected number of post-publication citations is given by E( and, Taking the limit of t → ∞ and assuming pre-publications are negligible, we obtain the approximation of the expected number of long-term citations of m(e φ i θ J i − 1). Using the approximation for the total number of citations E(C i (t)) we can also obtain an approximation for the expected instantaneous number of citations. This approximation shows that the number of citations can initially increase, even if the temporal decay is exponential. We use a continuous time approximation, and take the derivative of Eq. A16 with respect to t and assume θ = 1 for simplicity. We then obtain the approximation that which attains its maximum at t = β i log φ i for φ i > 1. This shows that citations first increase and then decrease, similar to what is observed empirically. Publications with a slower decay attain this peak later. Similarly, publications that have a higher latent citation rate also attain the maximum at a later time. Interestingly, this is formally equivalent to an older result from Avramescu 39 , see Eq. (5) therein. We can also analyse the variance of C i (t) and obtain the recursion Var(C i (t)) = Var(C i (t − 1)) + Var(c i (t)) + 2 Cov(C i (t − 1), c i (t)).
Since Cov(C i (t − 1), c i (t)) > 0 this recursion yields a variance Var(C i (t)) that is larger than the expected value. Hence, there is considerable uncertainty in citations in this model, even for an exact φ i and θ j . This means that even for specific φ i and θ j , the distribution of citations would be quite skewed. It is therefore possible that skewed citation distributions within a journal emerge, even if latent citation rates φ i are homogeneously distributed, as suggested by Waltman and Traag. 18 This result is mostly due to the rich-get-richer effect, also known as Matthew effect or cumulative advantage, which is frequently argued to explain the high variance and skewness observed in most citation distributions, dating back to Price 30 . Without the rich-get-richer effect, citations C i (t) would simply be Poisson distributed around ∑ t τ=0 λ i (t) f i (t)m according to this model. In that case, citation distributions tend to be less skewed for specific φ i , so that the skewness in citation distributions may require a more heterogeneous distribution of φ i . We cannot distinguish between these two alternative possibilities based on our empirical observations. In line with previous literature, we assume the presence of a rich-get-richer effect. It would be interesting to empirically substantiate the rich-get-richer effect, but this goes beyond the scope of this paper.

Appendix B: Results
We here provide more details of the results discussed in the main text. We first present results for all 3 885 subsets of journals per year and field (excluding the seven diverging subsets, see Appendix A 2). As is clear in Fig. B.2, the overall patterns are the same as in Fig. 4. There is some variation per field and year. The relationship between impact and the multiplier and between impact and the median latent citation rate is apparent for all fields and years. There is some variation over fields, as shown in Fig. B.4A. The multiplier seems to be relatively high for Statistics, whereas Quantitative Finance shows a relatively low multiplier. Possibly, statisticians do not regularly follow new preprints on arXiv. There seems to be some trend over the years of increasing journal citation multipliers in Fig. B.4B, but the trend is not very clear. It is not immediately clear how this relates to the evolution of the decreasing correlation between citations and impact factor over time. 44 In the main text, we argued that the journal causal effect Pr(C | do(J)) was not affected by the selection on arXiv papers A. The same does not hold for the estimate Pr(J | do(φ )), as A acts as a mediator: φ may affect A which in turn may affect J (e.g. some journals may have policies against publishing preprints). The effect of φ on J perhaps only holds for arXiv preprints. To better understand this possible mediating effect, we computed for each journal the proportion of arXiv papers it published. We only included arXiv papers that had at least a preprint duration of at least 30 days. We find there is no discernible relationship between journal impact and the proportion of arXiv papers (Fig. B.5). In other words, A is unlikely to act as a mediator, suggesting that high-impact journals indeed select articles with higher latent citation rates. This observation is again confounded by the latent citation rate φ , but it would be rather surprising to have a confounding effect that exactly cancels out the actual causal effect of A on J, so that we observe no correlation between A and J.
The median predicted number of citations closely aligns with the observed number of citations B.6. Unsurprisingly, the predicted number of citations follows closely the empirical patterns as observed in Fig. 3. The 95% interval within which the predicted number of citations fall seems a multiplicative factor away from the median predicted number of citations. A reasonable estimate is that citations can be between half and twice the observed number of citations in our model. This quantifies both the uncertainty of the inferred parameters as well as the uncertainty arising from the citation dynamics themselves. For lower number of citations the interval is a bit broader. Detailed results. This shows the dependency of the citation multiplier θ , the median latent citation rate e Φ and the ε on journal impact (a-c). The visualization shows the median and the errorbars represent the 95% percentile interval. This also shows the same results but separated per year (d-f) and field (g-i).