Citations and publication rate of preprints on pharmacological interventions for COVID-19: The good, the bad and, the ugly.

Background: Preprints are preliminary reports that have not been peer-reviewed. On December 2019, a novel coronavirus appeared in China, and since then, scientic production, including preprints, has drastically increased. In this study, we intend to evaluate how often preprints regarding pharmacological interventions against COVID-19 were cited, in spite of the fact that some of these preprints remained unpublished. Methods: We conducted a search on medRxiv and bioRxiv to identify preprints related to pharmacological interventions against SARS-CoV-2 from January 1, 2020 to March 31, 2020. We included any study type that addressed or reported data on pharmacological interventions. We gathered metadata on June 26, 2020 of included preprints and identied if they had been published in a scholarly journal. We performed Mann-Whitney U tests to evaluate if published articles had differences in citation counts or metrics, as dened by PDF downloads and abstract reads, when compared to unpublished preprints. Results: Our sample included 97 preprints, of which 23 were published on scholarly journals and 74 remained unpublished (Publication rate of 23,7%). The most common study designs we found among preprints were basic science research and case series. The number of citations in our sample ranged from 0 to 1409 for published articles, and ranged from 0 to 175 citations for unpublished preprints. Published articles had a signicantly higher number of citations when compared to unpublished preprints (p=0,000013). We did not nd a statistical difference in PDF download (p=0,167) and abstract reads (p= 0,181). In the published articles, the time from posting on a preprint server to publication on a journal ranged from 0 to 98 days (median: 42.0 days). The time period from date of submission to a journal to date of acceptance in our sample ranged from 1 to 228 days (median: 23 days). Almost half of the preprints that were subsequently published (47,8%) had modications made to the result section after peer-review. Conclusions: The publication rate of the preprints in this sample was low (1 in 4), although review times in scholarly journals seems be accelerated. no difference downloads between preprints in not


Introduction
A preprint is a preliminary report that is shared publicly before it has been peer-reviewed. Most preprints have a digital object identi er (DOI) and can be cited in other research articles (Kaiser, 2019). There are multiple servers that host preprints, including bioRxiv and medRxiv, operated by Cold Spring Harbor Laboratory. BioRxiv is a free server for unpublished manuscripts related to biological sciences, while medRxiv focuses on medical, clinical and health science issues (Kaiser, 2019). The usage of information from preprints has been controversial. Advocates of preprints claim that this medium can accelerate access to science ndings and improve quality of published works by permitting faster feedback of the work by the scienti c community before publishing. As well, they argue that the audience for preprints is larger, because many articles published in scholarly journals do not have open access (Kaiser, 2017).
Those who are against the use of preprints state that many of these investigations may be awed due to the lack of a peer-review process (Chalmers & Glasziou, 2016). Thus, readers should be aware that articles on preprint servers are not necessarily nal versions and might contain errors, and that information reported in preprints have not been reviewed or endorsed by the scienti c or medical community (Brainard, 2020;Kaiser, 2017Kaiser, , 2019. Coronavirus disease 19 (COVID-19) was rst described on December 2019 in Wuhan, China, and the World Health Organization (WHO) declared it as a pandemic on March 12, 2020(WHO, 2020. Due to the drastic increase in scienti c production on this novel disease, scholarly journals have been overwhelmed, and in response, some journals have shortened their usual processes to accelerate the time it takes to publish an article (Brainard, 2020). Nonetheless, due to the sanitary emergency, authors have been urged to share relevant information as soon as possible, thus, an increasing amount of studies have been posted as preprints (Rubin et al., 2020). As well, many scholarly journals have required that authors share their work initially in a preprint server, before it goes through peer-review and editing processes (Fidahic, Nujic, Runjic, Civljak, Markotic, Lovric, et al., 2020).
A recent analysis identi ed that articles published on scholarly journals during the COVID-19 pandemic were mostly case reports, while articles posted on preprint servers mostly included modeling studies (Fidahic, Nujic, Runjic, Civljak, Markotic, Lovric Makaric, et al., 2020). However, the number of preprints that will actually be published after a peer-review process remains uncertain.
The purpose of our study is to show how many preprints regarding pharmacological interventions against COVID-19 from January to March 2020 were published on a scholarly journal. As well, we intend to evaluate if there were differences in how often preprints or peer-reviewed articles were cited, and if preprint metrics (abstract and PDF downloads) were different when comparing preprints that remained unpublished or eventually got published.

Search methodology
We conducted a search on medRxiv and bioRxiv, two of the most prominent distribution servers for preprints related to health sciences (Else, 2019), to identify preprints addressing or reporting data on pharmacological interventions for COVID-19 that had been posted on these servers from January 1, 2020 to March 31, 2020. The search was initially conducted on May 16, 2020, however, we updated the publication status, number of citations and metrics data on June 26, 2020.
We used the advanced search tool on medRxiv and bioRxiv to limit the time period, and used keywords to nd preprints that were related to COVID-19 therapeutic interventions. We used keywords relating to pharmacological interventions that have been studied for COVID-19 based on previous narrative reviews, such as "Remdesivir" or "Hydroxychloroquine" (McCreary & Pogue, 2020; Nicolalde et al., 2020). The use of Boolean operators was not available. We individually searched for each keyword, but we did not use simultaneous keywords related to SARS-CoV-2 infection, as we wanted to initially obtain as much preprints as possible. After this preliminary search we identi ed 542 preprints ( Figure 1).

Eligibility criteria
After excluding duplicates, we manually revised each abstract and did a full text review if needed, to determine if the preprints were related to COVID-19 pharmacological interventions. We decided to include any study design, including case series that reported and/or made an analysis on pharmacological interventions, as case series may motivate the development of studies of a stronger design to test a determined treatment (Chan & Bhandari, 2011). We excluded all studies that a) were not related to SARS-CoV-2 infection, or b) were not related to pharmacological interventions. After applying eligibility criteria, we included 97 preprints to our study.

Data collection
For each included preprint, we gathered information on the preprint server that hosted it (medRxiv or bioRxiv), DOI, title, list of authors, date of rst publication on the server, and metrics (PDF downloads and abstract reads).
In order to identify preprints that were published on a scholarly journal we did the following process. MedRxiv and bioRxiv usually add a link to the published version of a preprint on a journal if available, however, the published version may not be recognized and linked due to changes in the title or the list of authors. If a link to a published version was not available, we conducted an individual search on Google Scholar using the title of the preprint and the name of the corresponding author to verify that we identi ed all preprints that were subsequently published. After identifying all the preprints that were published on scholarly journals, we gathered information including the new DOI, journal of publication, date of publication, date of submission and date of acceptance. As well, if available, we obtained the mean time to acceptance in these journals in 2019, before the COVID-19 crisis.
To obtain information on how often preprints or articles published on scholarly journals were cited, we conducted an individual search on Google Scholar for each article. We decided to use Google Scholar as it hosts both preprints and published articles, and citation information is readily available. For preprints that were subsequently published on a scholarly journal, we decided to include only the number of citations that the published version had for the analysis. The data on metrics we obtained and included in our study were gathered from the preprint servers (bioRxiv and medRxiv), regardless of the publication status of these preprints. We collected data on the number of citations and metrics on June 26, 2020.

Statistical analysis:
We conducted a Mann-Whitney U test, a non-parametric test, to assess if there was a difference in how often preprints or articles published in scholarly journals were cited. As well, we also used a Mann-Whitney U test to determine if preprints that eventually were published in a scholarly journal had higher metrics (provided by the preprint servers) than preprints that were not published. We performed all our statistical analyses on IBM SPSS Statistics for Windows, version 25 (IBM Corp., Armonk, N.Y., USA).

Results
From a total of 97 preprints included in our study, 23 were published on a scholarly journal, while 74 remained unpublished up to June 26, 2020. The publication rate in this sample was 23.70%. Considering the date in which preprints were posted for the rst time on the preprint servers, and the date in which we collected data, a minimum time of 87 days, and a maximum of 150 days (median: 119 days) passed. Figure 2 shows that the most common study types in preprints were basic science research (n= 39; 40.2%) and case series (n=35; 36.1%). Among the preprints that were eventually published on scholarly journals, we identi ed basic science research (n=11; 47.8%), case series (n=8; 34.8%), reviews (n=2; 8.7%), a systematic literature review (n=1; 4.3%) and an interventional study (n=1; 4.3%).
The number of citations in our sample ranged from 0 to 1409 for articles published on scholarly journals, and ranged from 0 to 175 citations for unpublished preprints. We initially did the statistical analysis for the number of citations and metrics on our complete data set (Table 1). In order to increase the statistical power of our study, we decided to perform a separate analysis identifying and excluding uncited articles (n=10; 10.3%), and excluding an article within the published group that had 1409 citations, as it was considered an outlier ( Table 1). The article we excluded was a non-randomized clinical trial that assessed the use of hydroxychloroquine and azithromycin as a therapeutic intervention against COVID-19 (Gautret et al., 2020).
After excluding this data, articles published on scholarly journals had a signi cantly higher number of citations (median: 65.5; IQR:13-90.25) than unpublished preprints (median:9; IQR:3-17; p=0.000013). We did not identify a signi cant difference in the number of abstract reads (p=0.181) and PDF downloads (p=0.167) when we compared unpublished preprints to preprints that were eventually published to a scholarly journal (Table 1). Statistical signi cance did not vary when compared to our initial analysis on the complete database.
Amongst the 23 preprints that were published on a journal, the time that elapsed since date of rst posting on a preprint server to date of publication on a scholarly journal ranged from 0 to 98 days (median: 42.0 days). For articles in which the data was available, the time from the date of submission to acceptance ranged from 1 to 228 days (median: 23 days). In 2019, before the COVID-19 crisis, the time from submission to acceptance in the same group of journals ranged from 27 to 193 days (median: 56.7) ( Table 2).
We also found that almost half (n=11; 47.8%) of the preprints that were published had modi cations made to the title or the result section, nine articles (39.1%) had modi cations in the methods, and eight articles (34.7%) had modi cations made to the discussion section. The list of authors changed in three articles (13%).

Discussion
Preprints intend to accelerate the access to preliminary data for the scienti c community, mainly to receive rapid feedback prior to entering a peer-review process, which is a requirement for publication in the majority of indexed journals (Mudrak, 2020). MedRxiv and bioRxiv, widely-known preprint servers (Else, 2019), have a disclaimer on their homepage that states: "these are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information".
Scienti c production has drastically increased due to the COVID-19 pandemic. Approximately 900 articles, including published works and preprints, were published prior to March 12, 2020(Callaway et al., 2020. A signi cant proportion of the preprint surge could be a result of scholarly journals requiring that works submitted for peer-review must be made simultaneously available for the public on a preprint server (Fidahic, Nujic, Runjic, Civljak, Markotic, Lovric Makaric, et al., 2020).
In spite of the fact that we identi ed a considerably shorter time to acceptance due to the COVID-19 crisis in the group of journals we analyzed, the publication rate for the preprints included in our sample was 23.7%. A recent preliminary analysis estimated that the average turnaround times for scholarly journals during the COVID-19 pandemic was 60 days (Horbach, 2020). Considering the date of posting on a preprint server, our sample had a median follow-up of 119 days (min: 87, max: 150) to determine publication status. Thus, we consider that the follow-up we provided was enough to identify the majority of preprints that would eventually be published, and we consider that it is unlikely that the publication rate in our sample will be altered signi cantly.
In comparison to our ndings, in previous infectious outbreaks such as Zika or Ebola, the publication rates for preprints on a peer-reviewed journal were approximately 60% and 48%, respectively. However, only 174 preprints were posted during the Zika outbreak (Nov 2015 to Aug 2017), while 75 preprints were posted during the Ebola outbreak (May 2014 to Jan 2016) (Johansson et al., 2018). Until June 28, 2020, 4651 preprints related to COVID-19 have been posted on medRxiv, while 1189 preprints were posted on bioRxiv, evidencing the drastic increase in preprint production during the COVID-19 pandemic. As a matter of fact, the scienti c community has never before produced so much non-peer reviewed data (Dinis, 2020).
In our sample, we found that preprints that were eventually published in a scholarly journal had a signi cantly higher number of citations when compared to preprints that remained unpublished. Even though we did not directly evaluated the quality of the preprints, the number of citations is an indicator of the scienti c impact, which is one of the components of the concept of scienti c quality (Aksnes et al., 2019). Thus, considering the publication rate and the lower citation count we identi ed in our sample, we could assume that some of these preprints may not have the quality needed to go through the scrutiny of a peer-review process in order to be published on a scholarly journal. Further studies to directly assess the quality of preprints posted during the COVID-19 pandemic are required.
We did not nd signi cant differences in terms of metrics when we compared unpublished preprints to preprints that were subsequently published on journals. This raises the concern that the scienti c community, and the general population, may read and share preprints that do not have enough quality to go through a peer-review process.
We found that half of the preprints that were subsequently published had signi cant modi cations in the result section, which suggests that preprints can change importantly after peer-review, raising concerns on the possibility of signi cant errors in the data analysis of preprints that are not peer-reviewed and published, as previously reported (Else, 2019;Heimstädt, 2020).
Some preprints might contain essential and time-sensitive information. For example, a study showed that the basic reproduction number, R0, calculated using data available on preprints was not different to the one estimated in peer-reviewed articles (Majumder & Mandl, 2020), and preprints on the viral sequence and structure have allowed for early investigation of potential therapeutic options and vaccines (Brainard, 2020;Kwon, 2020).
While there is a widespread agreement that preprints could be useful in the current context, there are signi cant risks associated with the potential spread of faulty data without appropriate third-party screening (Dinis, 2020). Lack of a peer-review process in preprints may be an important implication, due to the fact that the basic screening process employed by preprint servers may not be enough to avoid the dissemination of awed information (Rawlinson & Bloom, 2019). For example, a preprint that was posted on bioRxiv suggested signi cant molecular similarities between SARS-CoV-2 and HIV (Kwon, 2020). This preprint was later withdrawn from the server, however, by the time that happened, it had already sparked controversy and conspiracy theories. To our concern, we found that during the COVID-19 pandemic multiple preprints have been used in the development of clinical guidelines and public health policies In spite of the fact that peer-review aims to be an exhaustive and thorough process that improves the quality of a manuscript, articles published on a peer-reviewed journal should not be taken as nonrefutable knowledge. To illustrate this, a couple of peer-reviewed articles have been recently withdrawn from two prestigious journals due to signi cant concerns on primary data validity(M. Mehra et al., 2020; M. R. .
Main limitations to our study include the fact that we only included preprints on pharmacological interventions against COVID-19. As well, we only used medRxiv and bioRxiv as preprint servers to obtain our sample. However, due to the follow-up of our study, it seems unlikely that the publication rate would be altered signi cantly.

Conclusions
In our sample, most of the preprints remained unpublished, in spite of accelerated editing processes by multiple scholarly journals. Preprints that were eventually published had a signi cantly higher number of citations when compared to unpublished preprints, which suggests a higher scienti c impact and quality of published preprints. This raises concern on the quality of preprints that did not go through peer-review. The prevalent culture of "publications at any cost" and "publish or perish" might contribute to the unprecedented surge of preprints (Juyal et al., 2014). Search strategy: We initially obtained 542 preprints. After removing duplicates and applying exclusion criteria, we included 97 preprints in the study.