Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science

With the rise of Wikipedia as a first-stop source for scientific knowledge, it is important to compare its representation of that knowledge to that of the academic literature. Here we identify the 250 most heavily used journals in each of 26 research fields (4,721 journals, 19.4M articles in total) indexed by the Scopus database, and test whether topic, academic status, and accessibility make articles from these journals more or less likely to be referenced on Wikipedia. We find that a journal's academic status (impact factor) and accessibility (open access policy) both strongly increase the probability of it being referenced on Wikipedia. Controlling for field and impact factor, the odds that an open access journal is referenced on the English Wikipedia are 47% higher compared to paywall journals. One of the implications of this study is that a major consequence of open access policies is to significantly amplify the diffusion of science, through an intermediary like Wikipedia, to a broad audience.


Introduction
Wikipedia, one of the most visited websites in the world, 1 has become a destination for information of all kinds, including information about science (Heilman & West, 2015;Laurent & Vickers, 2009;Okoli, Mehdi, Mesgari, Nielsen, & Lanam€ aki, 2014;Spoerri, 2007). Given that so many people rely on Wikipedia for scientific information, it is important to ask whether Wikipedia's coverage of science is a balanced, high quality representation of the knowledge within the academic literature. One approach to asking this question involves looking at references used in Wikipedia articles. Wikipedia requires all claims to be substantiated by reliable references, 2 but what, in practice, are "reliable references?" An intuitive approach is to compare the sources Wikipedia editors use to the sources scientists value most. In particular, within the scientific literature, a journal's status is often associated, albeit problematically (Seglen, 1997), with its impact factor. If status within the academic literature is taken as a "gold standard," Wikipedia's failure to cite high impact journals of certain fields would constitute a failure of coverage (Samoilenko & Yasseri, 2014), whereas a high correspondence between journals' impact factors and citations in Wikipedia would indicate that Wikipedia does indeed use reputable sources (Evans & Krauthammer, 2011;Nielsen, 2007;Shuai, Jiang, Liu, & Bollen, 2013).
Yet high impact journals often require expensive subscriptions (Bj€ ork & Solomon, 2012). The costs are, in fact, so prohibitive that even Harvard University has urged its faculty to "resign from publications that keep articles behind paywalls" because the library "can no longer afford the price hikes imposed by many large journal publishers" (Sample, 2012). Consequently, many within the scientific community advocate journals that provide free access to research -"open access" (OA) journals. (Van Noorden, 2013). A lively debate has arisen on the impact of OA on the scientific literature, with some studies showing a citation advantage to publishing in OA versus paywall journals (Eysenbach, 2006a,b;Gargouri et al., 2010; "The Open Access Citation Advantage Service") whereas others find none (Davis, 2011;Davis, Lewenstein, Simon, Booth, & Connolly, 2008;Gaul e & Maystre, 2011;Moed, 2007).
Regardless of their impact on the scientific literature, OA journals may have a tremendous impact on the diffusion of scientific knowledge beyond this literature. To date, this potential of OA policies has been a matter chiefly of speculation (Heilman & West, 2015;Trench, 2008). Previous research has found that OA articles are downloaded from publishers' websites more often and by more people than closed access articles (Davis, 2010(Davis, , 2011, but it is currently unclear by whom, and to what extent open access affects the use of science by the general public . We hypothesize that Wikipedia, with more than 8.5 million page views per hour, 3 is a new but crucial pathway through which the public consumes science and this diffusion of science may relate to its accessibility in two ways. By referencing findings from paywall journals that may be prohibitively expensive, Wikipedia distills and diffuses these findings to the general public. On the other hand, Wikipedia editors may be unable to access expensive paywall journals, 4 and consequently reference the easily accessible articles instead. For example, Luyt and Tan's (2010) study found accessibility to drive the selection of references in a sample of Wikipedia's history articles. In this case Wikipedia "amplifies" open access science by broadcasting its (already freely accessible) findings to millions. This "amplifier" effect may thus constitute one of the chief effects of open access.

Correspondence Between Academic and Wikipedia Statuses
This article tests both the distillation and amplifier hypotheses by evaluating which references Wikipedia editors use and do not use. In particular we study how a journal's status within the scientific community (impact factor) and accessibility (open access policy) relates to its status within Wikipedia (percent of its articles referenced on Wikipedia). It is important to note that an observed correspondence may be driven by a variety of mechanisms besides accessibility. First, the status ordering of academic journals as measured with impact factors may have only a tenuous relationship with the importance and notabilityconsiderations of special relevance to Wikipedia 5 -of the published research. Citations, and therefore impact factors, are in part a function of the research field (Seglen, 1997), and may be affected by factors as circumstantial as whether a paper's title contains a colon (Jamali & Nikzad, 2011;Seglen, 1997). Second, the academic status ordering results from the objectives of millions of scientists and institutions, and may be irrelevant to the unique objectives of Wikipedia. Wikipedia's key objective is to serve as an encyclopedia, not a medium through which scientists communicate original research 6 . Relative to the decentralization of the scientific literature, Wikipedia is governed by explicit, if flexible, policies and a hierarchical power structure (Butler, Joyce, & Pike, 2008;Shaw & Hill, 2014). Apart from a remark that review papers serve Wikipedia's objectives better than primary research articles, Wikipedia's referencing policies generally pass no judgment over which items within the scientific literature constitute "the best" evidence in support of a claim 7 . Wikipedia's objectives and explicit, centrally accessible, policies differ from the decentralized decisions that produce status orderings within the scientific literature and do not imply that the two status orderings should correspond. Indeed, if editors are not scientists themselves they need not even be aware that journal impact factors exist. 8 On the other hand, despite the well-worn caveats, prestigious, high-impact journals may publish findings that are more important to both academics and Wikipedia's audience. In fact, a Wikipedia editor's expectation that the truly important research resides within high-impact journals may be enough to predispose them to want to reference such journals. Second, little is known about editors of science-related articles (West, Weber, & Castillo, 2012); they may be professional scientists with access to these high-impact journals, resulting in both the motivation and opportunity to reference them.

Wikipedia References and Academic Status
The first large-scale study of Wikipedia's scientific references was performed by Finn Arup Nielsen (Nielsen, 2007). Nielsen found that the number of Wikipedia references to the top 160 journals, extracted from the citejournal citation templates, correlated modestly with the journal's Journal Citation Reports impact factor. This implication that Wikipedia preferentially cites high impact journals is delicate in part because the data used in the study included only a subset of journals with references that appear in Wikipedia, not journals that were and were not referenced. It is possible, albeit unlikely, that an even larger number of prestigious journals, made invisible by the methodology, are never referenced on Wikipedia at all, weakening the correlation to an unknown degree, or that the referenced journals are simply those that publish the most articles (see Nielsen 2007: Figure 1). Shuai et al. (2013) also found modest correlations when they investigated a possible correspondence between the academic rank of computer science papers, authors, and topics and their Wikipedia rank.
The altmetrics movement has also explored Wikipedia as a nonacademic venue on which academic literature makes an impact (ALM, Fenner, & Lin, ;"altmetrics,";Priem, 2015). Evans and Krauthammer (2011) examined the use of Wikipedia as an alternative measure of the scholarly impact of biomedical research. The authors correlated scholarly metrics of biomedical articles, journals, and topics with Wikipedia citations and, in contrast to other studies, included in some of their analyses a random sample of journals never referenced on Wikipedia. The authors also recorded a journal's OA policy but, unfortunately, do not appear to have used this information in analyses.

Open Access and the Web
The rather voluminous literature on OA has focused primarily on effects on the academic literature. 9 There is some debate on the size and direction of OA effects. Some evidence demonstrates that OA articles gain a citation advantage (Eysenbach, 2006a(Eysenbach, , 2006bGargouri et al., 2010; "The Open Access Citation Advantage Service"), whereas other evidence shows no such effect (Davis, 2011;Davis et al., 2008;Gaul e & Maystre, 2011;Moed, 2007). Regardless of the impacts on scientists in developed nations, increased accessibility through OA does yield benefits to scientists from developing nations Evans & Reimer, 2009).
The promise of OA for disseminating scientific information to the world at large has gained much less attention Trench, 2008; for an exception see Heilman & West, 2015). Yet, more and more of the world turns to the web for scientific information. For instance, as early as 1999 a full 20% of American adults sought medical and science information online (Miller, 2001). What's more, one who actively seeks such information within the academic literature will quickly discover that, despite the paywalls, many important and impactful research articles are made freely available by their authors or third parties (Bj€ ork, Laakso, Welling, & Paetau, 2014;Wren, 2005). This is to say nothing of the fact that science may also be disseminated through distillation of its findings into venues like Wikipedia or science-centric websites and blogs so that, here too, the impact of OA may be limited. Although full texts of the most impactful literature are, at least nominally, behind a paywall (Bj€ ork & Solomon, 2012), do Wikipedia's editors consult these texts? If they cite them in Wikipedia, have they consulted the full texts beyond a freely available abstract before referencing? If the academic literature is any guide, referenced material is sometimes consulted rather carelessly (Broadus, 1983;Rekdal, 2014). In short, the current understanding of the relationship between OA and the general public in the literature is limited at best .

Shortcomings and Our Contribution
In addition to the role of accessibility, a number of substantive and methodological shortcomings remain. First, it is unclear if professional scientists edit Wikipedia's science articles. As we will show, a preponderance of paywall references would suggest, albeit indirectly, this to be the case. 10 The scant existing evidence indicates that science articles are edited by people with general expertise, relative to the more narrow experts of popular culture articles (West et al., 2012). Second, most previous studies have completely ignored the articles that are never referenced on Wikipedia, thus sampling on the dependent variable. The only notable exception, (Evans & Krauthammer, 2011), treated the unreferenced articles outside the main analytic framework. Although the framework treated (referenced) articles or journals as the unit of analysis, the unreferenced articles and journals were treated as a homogeneous group.
This study extends existing work in three chief ways. First, it models the role of accessibility (OA status) on referencing. Second, it covers all major research areas of science by observing rates at which Wikipedia references nearly 5,000 journals, accounting for nearly 20 million articles. Third, it treats unreferenced articles in the same analytic framework as those referenced. Yet the study is not without its own limitations, which will be outlined more fully later in the article. Chief among these are that article-level characteristics are operationalized by the characteristics of the publishing journal. For example, the accessibility of articles is operationalized by their journal's OA policy, when, in fact, free access to many paywall articles exists through sanctioned or unsanctioned file-sharing (Bj€ ork et al., 2014;Wren, 2005). Thus, any observed advantage of OA referencing may be biased downward, that is, an underestimate of the true effect (see the Conclusion for a discussion of measurement error).

Data Sample
Journal data. Our analysis uses journal-level data from thousands of journals indexed by Scopus. Indexing over 21,000 peer-reviewed journals and with more than 2,800 classified as open access, Scopus is the world's largest database of scientific literature. 11 We obtained information on the 250 highest-impact journals within each of the following 26 major subject areas 12  Social Science, Veterinary Science, Dental, Health Professions. Assignment of journals to these broad subject areas is not exclusive; many journals fall into more than one category. As a result of cross listing, the list of candidate journals was less than 6,500. The final data consisted of 4,721 unique journals, 335 (7.1%) of which are categorized by the Directory of Open Access Journals (DOAJ) as "open access." Journals were also categorized more narrowly using the more than 300 "All Science Journal Classification" (ASJC) subject codes, 13 for example, animal science and zoology, biophysics, etc. These narrow codes were used to identify journals that address similar topics and thus indicate whether the journal is at risk for reference vis-a-vis demonstrated demand. Journals with at least one narrow subject code in common were considered "neighbors" and if at least one of these neighboring journals has been referenced the original "ego" journal was considered to be at risk for reference as well. Journals with no demonstrable demand were excluded from analysis. As an example, consider the journal Science. It is listed under (ASJC) subject code 1000-general science. Other journals with this code-the "neighbors" of Scienceare Nature, PNAS, and Language Awareness. Language Awareness is cross-listed under five other subject codes. Impact factor was measured by the 2013 SCImago Journal Rank (SJR) impact factor. SJR correlates highly with the more conventional impact factor but takes into account selfcitations and the diverse prestige of citing journals (Gonz alez-Pereira, Guerrero-Bote, & Moya-Aneg on, 2010; Leydesdorff, 2009 English Wikipedia data. References in the English Wikipedia were extracted from the November 15, 2014 database dump of all articles. We parsed every page and following (Nielsen, 2007) extracted all references that use Wikipedia's cite-journal template. Because it allows editors to easily include inline references that are automatically rendered into an end-of-article bibliography, this template is the recommended way for editors to reference scientific sources in Wikipedia. 14 In all, there were 311,947 "cite-journal" tags in the English Wikipedia. An exploratory analysis of the 49 largest non-English Wikipedias can be found in the Appendix.

Matching Scopus Journals to Wikipedia References
We checked each of the referenced journal names on Wikipedia against a list of Scopus-indexed journal names and common Institute for Scientific Information (ISI; now Thomson Reuters) journal name abbreviations. Of the 311,947 cite-journal tags, 203,536 could be linked to journals indexed by Scopus. Many of these references were nonunique, whereas our outcome of interest is whether articles from a journal are referenced on Wikipedia at all, not how many times. Therefore, to ensure that the counts for each journal included only unique articles, we distinguished articles by their DOIs and, if an article's DOI was not available, we used the article's title. Scopus's coverage of the output of various journals varies widely; our counts included only those articles published within the years of Scopus coverage.
In the end we matched 32,361 unique articles (and 55,262 total references) to our subset of Scopus journals (top 250 in each research field). Of the top Scopus journals, 2,005 are never referenced on the English Wikipedia. In most cases observed "journal names" that did not match to journals in Scopus were not academic journals but popular newspapers and magazines. Table A1 in the Appendix displays the 20 most frequently referenced sources that we were unable to link to Scopus. The top three nonscience  sources are Billboard, National Park Service, and Royal Gazette. However, efforts to match Wikipedia references to Scopus were imperfect, and the list also includes a handful of academic journals, including The Lancet.

Journal Versus Article-Level Unit of Analysis
We chose to take journals instead of individual articles as our unit of analysis for several reasons. First and most important, accessibility of articles, the focal point of this study, was measured at the journal level by whether the journal is or is not OA. Switching the unit of analysis to individual articles would have simply assigned the same value of accessibility to all articles from a particular journal. Second, although articlelevel citations are an attractive, finely grained metric, a journal's impact factor is also designed to capture citation impact, albeit more coarsely. The general topic of any given article is also well captured by the host journal's Scopus -assigned topic(s). Lastly, the matching of Wikipedia journal title strings to Scopus required some manual matching and these efforts were more practical at the level of thousands of journals instead of hundreds of thousands of articles.

Percent_Cited and Other Variables
We present some of our results in terms of percent_ cited-the percent of a journal's articles that are referenced on Wikipedia. An equivalent interpretation of this journallevel metric is the probability that a given article from a journal is referenced on Wikipedia. Figure 2 illustrates the distribution (kde) of percent_cited.
As Figure 1 demonstrates, the vast majority of journals that scientists use are referenced on the English Wikipedia very little: on average 0.19% of a journal's articles are referenced. 15 As mentioned, the academic status of journals was measured using (SCImago) impact factors. To limit the influence of the few journals with uncommonly high impact factors the impact factor variable was (natural) log-transformed when used in the models. Figure 2 displays the distribution of impact factor and log-impact factor; to aid visualization only journals with impact factor <515 are shown. Table 2 presents the summary statistics for key variables: percent_cited, impact factor, ln (impact factor), and open access. Additionally, analyses use dummies for the 26 subject categories, for example, psychology 0 or 1). Lastly, Figure 3 displays a scatter plot of the key dependent variable, percent_cited, versus impact factor and open access.
The scatter plots appear to show a modest relationship between a journal's impact factor and percent_cited, the percent of its articles referenced on Wikipedia, especially when considering journal size (dot size). The next section analyzes these relationships statistically.

Results
We first present results of English Wikipedia's coverage. We ask, does Wikipedia draw equally on all branches of science? Next we focus on the role played by a journal's status and accessibility in predicting Wikipedia references. An exploratory analysis of references in the 49 largest non-English Wikipedias can be found in the Appendix. Figure 4 summarizes which branches of the scientific literature the English Wikipedia draws upon. The left panel shows the number of articles published by the top 250 journals in each field. The right panel shows the percent of those articles that are referenced at least once in the English Wikipedia. Figure 4 indicates that the coverage of science, as measured by the use of references, is very uneven and limited across scientific fields (Samoilenko & Yasseri, 2014). The social sciences represent a relatively small candidate literature but a relatively large portion of this literature is referenced on the English Wikipedia (0.4-0.5%). At the other end of the spectrum, dentistry, also a relatively small literature, is rarely referenced (< 0.05%). The ordering of disciplines by percent_cited does not engender a simple explanation. For example, such an ordering does not appear correlated with traditional distinctions like hard versus soft science, or basic versus applied. This finding is echoed by Nielsen (2007), who found that "computer and Internet-related journals do not get as many [references] as one would expect if Wikipedia showed bias towards fields for the 'Internet-savvy'." The highly uneven referencing across disciplines suggests that discipline should be controlled for in any statistical model, as is done below.

Status and Accessibility
We now present results from an intuitive statistical model that predicts the probability p that an article from a given journal will be referenced given that journal's characteristics. The data-generating process is assumed to be binomial each journal i publishes n i articles and each of these articles is at risk p i of being referenced in Wikipedia, where p i depends on the journal. The probability that a journal i has k of its n i articles referenced in Wikipedia is thus: Pr y i 5kjn i; p i À Á n i k À Á p k i ð12p i Þ n i 2k . p i is assumed to be a (logistic) function of the journal characteristics x i 's (e.g., impact factor): lnð p 12p Þ5xb, where b are the parameters to be estimated. The model just described is commonly used for proportional outcomes; it embeds the familiar logistic regression within a binomial process and is known as a generalized linear model (GLM) of the binomial-logit family (Hardin & Hilbe, 2012: 153-4). Table 3 displays estimates from this model of how journal characteristics are related to its p, probability of referencing, fitted to the English Wikipedia.
The column of odds ratios indicates how the odds of referencing change with unit changes in the independent variables. For indicator variables, for example, OA, these ratios are interpreted as the increase in odds when the indicator is true. For example, the odds that an article is referenced on Wikipedia increase by 47% if the article is published in an OA journal.
To interpret these results in terms of probabilities rather than odds ratios we must evaluate the model at particular values of the variables. Figure 5 displays the observed and predicted references for a range of values of impact factor and open_access. The indicator variables designating particular disciplines are set at their modes (0).
The figure demonstrates that a journal's impact factor has positive and asymptotic effect on the percent of its contents referenced in the English Wikipedia (percent_cited). Open access journals (red dots) are relatively uncommon, but these journals are referenced more often than paywall journals of similar impact factor. For example, in our sample of psychology journals, OA journals have an average impact factor of 1.59, whereas closed access journals have an average impact factor of 1.77. Yet in the English Wikipedia, editors reference an average of 0.49% of OA journals' articles but only 0.35% of closed access journals' articles, despite the higher impact factors.

Conclusion
This article examined in unprecedented detail and scale how the English language Wikipedia references the scien-tific literature. (The appendix contains an exploratory analysis of scientific referencing in Wikipedias of 49 other languages.) Of central interest was the relationship between a journal's academic status and accessibility on its probability of being referenced in Wikipedia. Previous studies have focused only on the role of academic status on referencing in the English Wikipedia and have often ignored unreferenced articles. In contrast, we began with a large (4,721 journals, 19.4M articles) corpus of scientific literature that scientists use, and estimated a statistical model to identify the features of journals that predict how much Wikipedia editors use them as sources.
We found that a journal's academic status (impact factor) routinely predicts its appearance on Wikipedia. We also demonstrated, for the first time, that a journal's accessibility (OA policy) generally increases probability of referencing on Wikipedia as well, albeit less consistently than its impact factor. The odds that an OA journal is referenced on the English Wikipedia are about 47% higher compared to closed access, paywall journals. Moreover, of closed access journals, those with high impact factors are also significantly more likely to appear in the English Wikipedia. Therefore, editors of the English Wikipedia act as "distillers" of high quality science by interpreting and distributing otherwise closed access knowledge to a broad public audience, free of charge. Moreover, the English Wikipedia, as a platform, acts as an "amplifier" for the (already freely available) OA literature by preferentially broadcasting its findings to millions.

Limitations and Directions for Future Research
Our findings are not without limitations. First and foremost, it bears emphasis that this study did not investigate the nature of Wikipedia's sources as a whole (see Ford, Sen, Musicant, & Miller, 2013 for an excellent examination of sources). Only a fraction of Wikipedia's references use the scientific literature, and this is the subset on which we focused. Consequently the present study cannot address the concern expressed by others, for example, (Luyt & Tan, 2010), that sources outside the scientific literature are used too heavily in scientific articles. Second, the study was cross-sectional in nature; it is conceivable that OA articles differ from closed access, paywall articles in their relevance to Wikipedia. Future work can test the potential confounding factor of unmeasured relevance by observing reference rates for articles which have been experimentally assigned to open and closed-access statuses, as has been done by some psychology journals (Davis et al., 2008).
Third, the study measured accessibility of articles by the OA policy of the publishing journals. However, many articles in paywall journals are made freely available by their authors or third parties (Bj€ ork et al., 2014;Wren, 2005). The resulting error in the measurement of accessibility may bias the observed advantage of OA in either direction: if OA articles from paywall journals, erroneously coded as closed access, are referenced at higher rates than the journals' truly closed access articles (Gargouri et al., 2010;Harnad & Brody, 2004), the true advantage of OA will be even higher than we observed. In the (unlikely) case that OA articles in paywall journals are referenced less than closed access articles, the observed open access advantage will be an overestimate. The academic status of articles is also operationalized by a journal characteristic-its impact factor. In fact, many articles out-or under-perform their journal's impact factor. Although this measurement error likely adds noise to the data, it probably does so without biasing the estimated effect of impact factor on referencing in one direction or another.

The Impact of Open Access Science
The chief finding of this study bears emphasis. We believe the existing discussion of open access has focused too narrowly on the academic literature. Early results showing that open access improves scientific outcomes such as citations have been tempered by newer experimental evidence showing small to null causal effects, and a lively debate has ensued. Our research shifts focus to diffusion, showing that open access policies have a tremendous impact on the diffusion of science to the broader general public through an intermediary like Wikipedia. This effect, previously a matter primarily of speculation, has empirical support. As millions of people search for scientific information on Wikipedia, the science they find distilled and referenced in its articles consists of a disproportionate quantity of open access science. these, we extracted tags containing "journal" or "doi." Thus the process for obtaining scientific references in non-English Wikipedias did not take into account language-specific tags. Non-English Wikipedias may also reference domestic scientific journals that are not indexed by Scopus. Thus, this exploratory approach surely undercounts scientific references to non-English Wikipedias.
The English Wikipedia referenced by far the greatest number of unique articles. Figure 1 displays the number of unique articles referenced in other Wikipedias, sorted by size (total articles).

Empirical Strategy
Certainly not all findings published in the academic literature belong on Wikipedia. Only small subsets of published findings are important and notable enough to be referenced in Wikipedia. Ideally, studies of how Wikipedia editors reference sources should explain which items in this smaller subset are and are not referenced. Nevertheless, previous studies have struggled to distin-guish the candidate articles that are at risk for reference from those that do not belong on Wikipedia. Yet, to model referencing decisions with all articles-including the dozens of millions of articles never referenced on Wikipedia-is likely to result in a model that predicts that no article will ever be referenced. Consequently most studies have voluntarily hobbled themselves by simply modeling only on the subset of referenced articles.
Here we propose a compromise strategy based on "demonstrated demand." The idea is simple: Articles are at risk for reference if other articles on the same topic are referenced. Topical reference indicates that there is demand from Wikipedia editors for literature on the topic  and that an article's characteristics (e.g., accessibility, status) may determine which of the candidate articles an editor finds and references. Conversely, if articles on a given topic are never referenced, it is likely that Wikipedia editors do not "demand" literature on this topic, no matter the accessibility or status of the supply. Demonstrated demand exists at the level of topics and, like accessibility and status, we identify an article's topic at the journal level. Demonstrable demand is also a language-dependent metric: Some Wikipedias may lack editors with expertise or interest in, for example, dentistry, thereby consigning all dentistry journals to irrelevance with regards to referencing decisions (but not irrelevant for analysis of coverage, of course). To calculate demonstrated demand we identify for each journal its topical "neighbors" and assign demand of 0 if none of these neighbors are ever referenced in a particular Wikipedia. We calculate demonstrated demand for a journal through its topical neighbors, which are defined as other journals that share at least one narrow (ASJC) subject code. Only one journal, Prevenzione & Assistenza Dentale, had no neighbors while the mean neighborhood size was 144.8. Figure 4 displays the distribution (kde) of neighborhood size. Table A2 contains the percentages of journals that were excluded from estimating models in each language. This percentage varies widely. For example, only 0.17% of journals were not used for the English Wikipedia model, 49.87% for Slovak, and a 100% for Volapuk. These numbers correspond directly to demonstrable demand for various research literatures by the editors of each Wikipedia. While the English Wikipedia references 32,000 articles from top journals, the Slovak Wikipedia references only 108 and Volapuk references 0.

Results
From each Wikipedia's model, two parameters are of focal interest: the odds ratio (of probability of referencing) when OA is True, and odds ratio when (log of) impact factor increases by 1 unit. Table A3 shows these odds ratios and associated p-values for each Wikipedia. Ratios significant at the 0.05 level are bolded.
While earlier results showed that both accessibility and status increase the odds that a journal will be referenced in the English Wikipedia, the relative strength of these effects varies across languages. Some Wikipedias, like the Turkish, prioritize a journal's academic status over accessibility; the odds of referencing high status journals are nearly 200% higher than lower status journals. Other Wikipedias, like the Serbian, prioritize accessibility over status; the odds of referencing an OA journals are 275% higher than a paywall journal.
Intuition and previous work suggest that reliance on OA literature is associated negatively with a country's level of economic development (Evans & Reimer, 2009), yet this pattern is not apparent in Table A3. For example, India and Ukraine, relatively poor countries naturally associated with the Hindi and Ukrainian Wikipedias, actually exhibit a small preference against OA literature, while a relatively wealthy country like Sweden has a Wikipedia that exhibits a huge preference for OA literature. The unexpected patterns may be because of the influence of bots (Steiner, 2014). For example, about a third of all articles on the Swedish Wikipedia were TABLE A2. Percent of journal data that are not used in estimates language-specific models (demonstrated demand 5 0). These percentages indicate the portion of research areas for which there is no demonstrable demand from (language-specific) Wikipedia editors.