Evaluation of the Dynamics of Large Scale COVID-19 Related Literature through Bibliometric Analysis from a Mathematical Standpoint

This study aims to analyze the dynamics of the published articles and preprints of Covid-19 related literature from different scientific databases and sharing platforms. The PubMed, ScienceDirect, and ResearchGate (RG) databases were under consideration in this study over a specific time. Analyses were carried out on the number of publications as (a) function of time (day), (b) journals and (c) authors. Doubling time of the number of publications was analyzed for PubMed “all articles” and ScienceDirect published articles. Analyzed databases were (1A) PubMed (01/12/2019-12/06/2020) “all_articles” (1B) PubMed Review articles) and (1C) PubMed Clinical Trials (2) ScienceDirect all publications (01/12/2019-25/05/2020) (3) RG (Article, Pre Print, Technical Report) (15/04/2020 – 30/4/2020). Total publications in the observation period for PubMed, ScienceDirect, and RG were 23000, 5898 and 5393 respectively. The average number of publications/day for PubMed, ScienceDirect and RG were 70.0 ±128.6, 77.6±125.3 and 255.6±205.8 respectively. PubMed shows an avalanche in the number of publications around May 10, the number of publications jumped from 6.0±8.4/day to 282.5±110.3/ day. The average doubling time for PubMed, ScienceDirect, and RG was 10.3±4 days, 20.6 days, and 2.3±2.0 days respectively. The average number of publications per author for PubMed, ScienceDirect, and RG was 1.2±1.4, 1.3±0.9, and 1.1±0.4 respectively. Subgroup analysis, PubMed review articles mean review <0|17±17|77> days; and reducing at a rate of -0.21 days (count)/day. The number of publications related to the COVID-19 until now is huge and growing very fast with time. It is essential to rationalize and limit the publications.


INTRODUCTION
A cluster of viral pneumonia cases of unknown cause, subsequently identified as a novel coronavirus, named as COVID-19 or 2019-nCoV, was detected on December 31, 2019 in Wuhan, China. [1]Subsequently, WHO declared it as a global pandemic on March 11, 2020. [2]This virus has rapidly crossed borders and led to a major healthcare crisis and economic slowdown around the globe.Most international health organizations have stated an urgent need to stop, control and reduce the impact of the virus at every opportunity. [3]ealthcare systems, various sectors of industry, and overall economy of the globe have been hit severely by this pandemic.As of July 23, 2020, there were 15 million confirmed cases worldwide with the number of fatalities in excess of 600, 000 so far. [3]w has the scientific world reacted to this pandemic?This pandemic has brought to fore the gaping incoherence in opinions expressed by agencies the world over.Government bodies, non-government institutions, pharmaceutical industry, researchers have all made their sounds, but without much unison.
Bibliometric analysis of COVID-19 scientific literature showed that one-third of the published papers were on clinical management with poor adherence to research priorities, and over sixty percent were opinion pieces not reporting original data. [26,27]resent study is limited to bibliometric analysis of temporal distribution (publication vs. days), publication doubling time, spatial distribution (publication vs. journal), and distribution of authorship (publication vs. author) in three different large medical databases.Further, we evaluated the dynamics of the growth characteristics of different parameters through the relevant mathematical formulation.The databases considered were PubMed, ScienceDirect and Research Gate.

REVIEW OF LITERATURE
Coronavirus disease 2019 (COVID-19) has impelled the global scientific community to take unprecedented interest on a single subject aimed at learning more about the disease, sharing knowledge immediately, and undertaking concerted, evidence-based efforts to mitigate the disease condition.Calls for ensuring that all research findings relevant to COVID-19 be made available openly and promptly had begun to appear as early as in January. [28]In March, UNESCO mobilized countries to promote open science and data sharing to manage this crisis.Simultaneously, a global research roadmap for COVID-19 was issued by the World Health Organization (WHO) which pointed to the existing knowledge gaps.It has also drawn up timelines for the implementation of specific research actions.
Based on the declaration from welcome trust, all journals have made the COVID-19 literature free to access. [28]With the rapid increase in the number of publications, some of the journals have already crossed their December 2022 issue, and most of the journals have filled until the December 2021 issue (as on 12-06-2020).
11][12]29] Chahrour et al. analyzed the publications between 16/12/2019-16/03/2020 in PubMed for the country wise research output.With the specific inclusion criteria, they could identify 564 articles from 39 different countries, with China producing highest number of articles (67%) followed by United States (7%); total 43% of all articles were "case report". [8]istovnik et al. and Zyoud et al. analyzed the bibliometric results of the research output in the early stage of outbreak of the pandemic for the Scopus database. [9,29]][7] Given the urgent need for evidence to support clinical and public health decisions, investigators have begun summarising and analysing the published literature to collective existing evidence in the form of systematic reviews.0][11][12] These bibliometric analyses have provided overviews of the COVID-19 research landscape.They have primarily focused on authorship, keywords, and collaboration patterns, country of origin, determined the top-cited publications, etc., [6,9] By one estimate, the COVID-19 literature published since January has reached more than 23,000 papers and is doubling every 20 days -among the biggest explosions of scientific literature ever. [4,5,13]Another study says the COVID-19 literature acceptance time and doubling time is 3 and 14 days respectively. [14]quick run of the total count of publications with the keyword "COVID-19" on the seventh known strain of the coronavirus that can infect humans on Google Scholar showed about 20,500 results for December 2019-May 2020.For the same period, PubMed yielded 23,480 results for the keyword "COVID-19"[All Fields] [as on 12-06-2020].The other 6 strains of the same virus yielded around 10 thousand publications between 2000-2020. [10]In addition to these staggering numbers, the pandemic has also affected the quality and content of publications.All editorial platforms and the publishing houses have created structures to promote and expedite the publication of COVID-19 related articles. [7]ofessional bodies, both national and international, of every specialty have attempted to bring out their own "guidelines" and "recommendations" to deal with patients of their specialty in the "COVID situation".Researchers seemed to write articles endlessly, often repeating and restating what was already written. [15]This also raises questions on the quality of data and the authenticity of the results and typically low peer review time. [4,5,14,15] April 29, 2020, after a hurriedly conducted trial, the National Institute of Allergy and Infectious Diseases proposed Remdesivir, as an effective drug for this virus. [16,17]A contrary opinion as well as serious criticism was given to these studies and some trials showed no benefit of using Remdesivir. [18,19]23][24][25] identified different characteristics of COVID-19 related articles like top 10 countries, journals, institution, research topic cluster and 20 highest cited articles.They identified 19,044 publications which were distributed as: Articles -9,140 (48.0%); correspondence -4,192 (22.0%); reviews -1,797 (9.4%), editorials -1,754 (9.2%); notes -1,728 (9.1%) and 433 (2.3%) were miscellaneous.
Gong et al. calculated the evidence map of COVID-19 related medical articles from PubMed and China National Knowledge Infrastructure databases from January 1, 2020 to March 8, 2020 and Liu et al. on PubMed and Embase database for the period 1 January to 24 March 2020. [6,11]Gong et al. found that by mid-February 2020, the number of articles in Chinese were 2.5 times than the English articles with highest contribution from Peoples Republic of China (PRC) comprising mainland China, Hong Kong, Macao, and Taiwan, followed by United States (97 articles), and other European countries, such as the UK (27 papers), Germany (15 papers) and Italy (15 papers).
Excessive number of literature on a short span of time calls for addressing some pertinent questions: (1) how is it possible to bring out such a large volume of literature in such a short time. [27]2) what is the reliability of the data collection, analysis, and literature review. [7](3) when articles, including randomized trials, are being accepted in a very short time, some showing same day acceptance, what is the quality and reliability of the peer review process. [14,15](4) how will it be possible to read and comprehend this large volume of literature to extract useful and actionable information. [30](5) whether this plethora of literature is actually translating or will it ever translate into any medical, social, or economic benefit. [4,5](6) how much of the information is real progress and how much is mere repetition?33] Probably the answer for most of the above questions shall be in negative.To further analyses the growth pattern of the Covid-19 related literature following study has been designed.

METHODOLOGY
Different medical databases, publication houses and sharing platforms were analyzed, each over a certain period of time to obtain the different characteristics and dynamics of the published/uploaded articles.All the databases created a separate section to tackle the COVID-19 related literature.The keywords in PubMed were "COVID-19" or "Coronavirus" or "Corona virus" or "Coronaviruses". [34]ScienceDirect and Research Gate (RG) have a separate database for COVID-19 specific research. [35,36]The data for different databases were extracted as a spreadsheet (PubMed) or saved first as a html (ScienceDirect and RG) file, which was then converted into a spreadsheet.Several articles and write-ups with only meager relationship to the disease were not further counted in the analysis.

FINDINGS PubMed: "all articles": Temporal distribution: Number of Publications (NOP) Vs Days
Total number of publications in PubMed: "all articles" was 23,480.Nonetheless, PubMed display is limited to 10,000 articles only therefore bibliometric analysis was limited to the displayed number of articles. [34]Figure 1a shows the number of publications until 13/06/2020 as a function of date and a a=0.09.In the post-avalanche Group-2, (supplementary Figure 2) it changed to a linear relationship with a slope of m= 280.2 pointing to the average number of publications per day (282.5)during that period.
PubMed: "all articles": spatial distribution (NOP vs. journals) The analysis was limited to the last 10,000 articles published in 1868 different journals, as per the availability from the PubMed.
Number of articles per journal was <1|15.2±10.3|191>,median=2.While 750 journals published a single article, 695 journals published 2 to 5 articles (2336 articles in total) and the rest 423 journals published 6914 articles.The highest number of articles (191) was published by British Medical Journal (BMJ), followed by Journal of Medical Virology (138) and Dermatologic Therapy (113).Supplementary Table 1 lists the number of publications as a function of the journals.

PubMed: Doubling time (d t )
The doubling time for the NOP was calculated considering 3 publications on February 11, 2020 as the base.Between 11/02/2020-13/06/2020 (123 days) total of 12 doublings were observed (2 12 ) and presented in Supplementary Figure 3.The mathematical relationship between number of DT vs. time shows a linear relationship (=10.3t-0.7;t in days, fitting

Analysis of PubMed Clinical Trials
Between January-June 2020, PubMed showed a total of 17 Randomized clinical trials (RCT) in (16 in English and 1 5-days moving average.The overall average (± standard deviation) number of publications is 70.0 ±128.6 /day.The avalanche on number of publications that occurred after May 10, 2020 is obtained from the number of average daily publications that jumped from 6.0±8.4 to 282.5±110.3(Figure 1b) yielding a growth of 47 times.As per the characteristics of the cumulative number, the publications were divided into two groups: from 13/01/2020 to 10/05/2020 (Group-1) and from 11/05/2020 to 13/06/2020 (Group-2).The mathematical relationship between cumulative number of publication and time (days) for group-1 shows a parabolic relationship (supplementary Figure 1) with a major coefficient  in Chinese language) published in 16 journals, with Lancet alone publishing 2 trials.However, after carefully reading each article, it was found that the publications from France and Italy were not RCTs and the total was reduced to 15; out of which 13 articles from China and 2 from Brazil.A total of 300 independent authors were involved in these articles, the median number of authors per article is 13 and the range is from 6 to 65.Of these, only 6 articles have information on submission and accepted dates.The average and median review time was <0|10.7±15.3|41>days, 5.5 days respectively.Two review articles were from the same research group, Hainan General Hospital, Haikou, China which were accepted in 0 and 2 days respectively in a single journal (Complementary therapies in clinical practice). [37,38]ienceDirect Temporal distribution: NOP Vs Days ScienceDirect had published 5898 articles (excluding Erratum) for the observation period between 30/12/2019 to 25/05/2020.[35] The average NOP/day and the range were <1|77.6±125.3|767>,median number of publications/day=27. Figure 3 shows the cumulative NOP (in log scale) and the differential NOP (in linear scale) as a function of time.Cumulative NOP varies in quadrature= -0.972t 2 + 162.01t -666.52; t indays with a fitting accuracy of 99.3%.

Doubling time:
The cumulative number of publications in ScienceDirect encountered 13 doublings in number, with an average doubling time of 20.6 days.Mathematical relationship between cumulative number of articles as a function of the number of DT instances, fits with exponential growth 0.7198e 0.6931n (n is the number of DTs) presented in supplementary Figure 4.
NOP vs authors: ScienceDirect identifies a total of 27,845 authors, with 22,675 authors having lone articles, 3,849 authors having 2 articles, 1297 authors having 3-10 articles and 23 authors having 10+ articles.The maximum number of articles for single authors was 34.Mean and range of the article is <1|12.3±0.9|34>,median =1 article.Supplementary Table 2 shows the number of publications as a function of the author (ScienceDirect).

ResearchGate (RG)
Like all other medical database RG also created a special section for all COVID-19 related literatures. [36]This string direct the researchers and readers to all the available COVID-19 literature without any key word search between 15/04/2020 and 30/04/2020, in all 5,395 COVID-19 related documents were uploaded to RG.After carefully scrutinizing all entries, eliminating repetitions, erratum, presentations and comments, the number of useful documents was reduced to 4,180.A total of 1,986 documents had full text.With different key word (multiple option available) 635, 1623, 744, 928 2180 documents were tagged as basic science, diagnosis, drug and vaccination, social and economic impact, and public health respectively.

Temporal distribution: NOP Vs Days
Temporal distribution of the uploaded documents, shown in Figure 4, yielded a linear relationship when plotted against time.The mean and median number publications/day were <40|258.4±130.8|441>and 273/day respectively.The mathematical relationship between NOP as a function of days shows a straight line with coefficient of increase for NOP/day The documents related to basic science, diagnosis, drug and vaccine development, social and economic impact, public health, and treatment (with multiple option) through key word search were 635, 1622, 743, 927, 2179, and 1318 items respectively.

Doubling time:
During the last half of April 2020, RG encountered 6 doublings in the number of uploaded documents with an average DT was <1|2.3±2.0|6>days.3 shows the number of documents against the authors.A total of 3,537 authors in RG have single article, where as 257 people have 2, 18 people have 3, and another 18 people have 4 to 9 documents.

DISCUSSION
The growth pattern presented in this article along with all previous articles indicate the number of literature is a singular and perhaps an unprecedented event, at least at this scale: This tendency of somehow to get an article published, is seen across the board, from established researchers of the most reputed medical schools to undergraduate scholars from little known universities.It is therefore not surprising that reputed journals had to do one of the biggest ever retraction of scientific papers in the modern history, marking a grim testimony to the chinks in the peer review process [23][24][25] and robustness of quality checks and audits of the submissions.[25] Making use of the prevailing situation, some new predator journals have also popped up, usually paid journals, to present these pseudoscientific facts. [39,40]42][43] Bonini et al. analyzed 11,000 publications until 12 th May 2020 in PubMed and found a COVID-19 related publication every 6 min.They analyzed the clinical trials associated with different monoclonal antibodies and found 147 trials for Chloroquine/ Hydroxychloroquine, 39 for the efficacy of tocilizumab, 23 for the use of systemic Corticosteroids, and multiple other studies analysing the same aspect of different drugs.They concluded such an overwhelming number of similar trials are originated due to the absence of guidelines for more harmonized clinical research. [32]Similar work on the analysis of the clinical trials was also done by Hillel et al. [42,43] They analysed 285 registered interventional clinical trials in 18 large trial registry databases associated the treatment and prevention of COVID-19.They concluded in the first 6 months of the pandemic many trials were registered and apparently completed.However, failed to yield rapid results in the literature or on clinical trial registries.Similar to our study Ioannidis et al. analysed the rapid growth of COVID-19 publications in terms of engagement of scientific workforce on Scopus database for 2 months 1 st January to 1 st March 2020. [41]They found a total of 129,570 COVID-19 papers which is 3.3% of the total of all scientific papers (3,963,55) across all scientific disciplines published and indexed in Scopus until 1 st January 2020. [41]one et al. analysed published articles until May 2020.It was found as high as 60% of the published literature "opinion articles" without proper scientific evidence.Only 150 journals published half of the 10,000 published papers which substantiate our findings. [26]Another study by slim et al. analyzed 15,909 COVID-19 articles in the top 100 surgery journals between March-June 2020 and found 83.4% of articles were "opinion articles" and 40% of COVID-19 related articles published in the top 10 Journals. [27]l these investigating groups found this as an unprecedented editorial situation caused by a pandemic.60-80% of articles are opinion pieces like editorials or viewpoints, narrative reviews, surveys, letters, guidelines without proper scientific analysis or evidence.Only around 8% of articles were proper scientific articles, i.e., randomized trials, original articles (including meta-analysis) with structured methods and results.
As established by several authors in their early bibliometric analysis of COVID-19 literature, and in our study as well, PRC is the leading contributor of the scientific articles. [6,8,9,11,12,41]A total of 87% of the controlled trials in PubMed has originated from PRC; 90% investigators in controlled trials are of Chinese origin.This kind of runaway publication raises concerns regarding nexus and pressure, for which the biomedical community will need to reorient the approaches to peer reviewed publication. [5,13,26,32,33]The other alarming tendency among the journals and reviewers are very short review time, which was observed as decreasing at a rate of -0.21 days with every passing day.Articles were often accepted in the same day or within a week. [5,14]shorter review period is not acceptable as it is rather impossible to judge the any article is such short time.In PubMed RCT section, we found two randomized controlled trials which were accepted within an average of 1 day. [37,38]verage review date for a review article in ScienceDirect journals reduced to 2 weeks instead of usual 5-6 weeks time.With a very short review time quality of studies will obviously be misjudge or highly compromised.Such under reviewed publication will produce more clinical concern in future.
We have presented the data from a closed data sharing platform PubMed operated by National Institute of Health with no role of end users, data from one publishing house (ScienceDirect) and an open data sharing platform Research Gate operated by the end users.This mixed approach of bibliometric analysis, presented in this study, helps in truly reflecting the overall scenario of all kinds of closed and open scientific data sharing platforms on the characteristics dynamics of COVID-19 data.

Mathematical Analysis
Other bibliometric analyses limit their reporting only to the mean value or graphically representing the temporal or spatial distribution of different parameters. [6,8,9,11,12,30]Unlike other bibliometric analysis as and when possible, we have introduced a mathematical function to identify the functional form of COVID-19 publication pattern.Typical forms are function of time or as a function of instances with time as implicit.The mathematical forms to characterize the growth pattern presented in this article either reflected by a conic section (straight line and parabola) or by a known series like exponential; fitting accuracy was quite good, varying between 97.9%-100%.This helps in two ways: first it mathematically categorizes the distribution and second it shows the tendency of the variation and allows calculating other parameters like slope.For example, cumulative number of publications as a function of doubling time for ScienceDirect (Supplementary Figure 4) shows an exponential increase; without fitting with a mathematical function, it is not possible to categorize the distribution.Represented by a known mathematical function (conic section or series) it provides an analytical form of the distribution which helps to comprehend the distribution.Nonetheless, it may not be always possible to present a suitable analytical form of the distribution.For example, Figure 1a, distribution is random, it may be possible to fit a complex form like B Spline curve with a low accuracy; however, under current conditions it is difficult to predict the tendency or comprehend the distribution.In such scenario the distribution is grossly represented by a mean change as a function of time.
Has the interest in publishing more on COVID-19 taken undue precedence over other subjects?The answer to this pertinent question can be judged from the fact that the number of publications on COVID-19 stands at 15,354 that far exceeds the number of publications on cancer (in its all derivatives) which is a paltry 4,718 (as on 14/07/2020).

CONCLUSION
We have presented the bibliometric analyses of the dynamics of COVID-19 related publications using different parameters through relevant mathematical formulation of three large databases.Our analysis, and few previous articles, undoubtedly proved the skewed bias toward excessive, unwarranted publishing on COVID-19 pandemic. [4,6,8,9,13,30,40]ough the disease has been declared as a global pandemic, it does not warrant unwanted literature piling up continuously.Research groups, publishing houses, journals, reviewers and all associates need to realize that the race for publications related to COVID-19 needs to be pragmatic, and not a blind race for one-upmanship.Publishing articles based on scientifically unimportant, pseudoscientific, fabricated, unreliable and harmful facts just to increase the number of publications and citations is actually harmful to society and disservice to the scientific community.

Figure 3 :
Figure 3: ScienceDirect: Cumulative and differential number of publications as a function of date.

Figure 2 :
Figure 2: PubMed Review Articles: Analysis of days in review (acceptancereceived date) days as a function of the submission date (PubMed review) -(The bigging date is 06-11-2019 earlier to the acceptance).

Figure 1b :
Figure 1b: Publication as a function of date for all articles presented in PubMed.

Figure 4 :
Figure 4: Shows number of publication vs days in last two weeks of the April 2020 (Research Gate).NOP increases at a rate of 294.1/day   =     1 .0.0034 Supplementary Material (all) Public document available from GitHub Link: https://github.com/biplabphy/COVID-19_Literature.git -05-2020); (3) Research Gate (Published article, Preprint, Technical Report) (15/04/2020 -30/04/2020).The dynamics of COVID-19 related documents in PubMed, ScienceDirect, Research Gate databases were analyzed for number of publications as (i) temporal distribution (publication vs. days), (ii) spatial distribution (publication vs. journal), and (iii) authorship (publication vs. author).Number of publications as a function of doubling time was analyzed for PubMed "all articles" and ScienceDirect published articles.Analysis of total review time (submission date to acceptance date) was done for 150 review articles published in ScienceDirect and COVID-19 related control trials presented in PubMed.