Open-source analytics tools for studying the COVID-19 coronavirus outbreak

To provide convenient access to epidemiological data on the coronavirus outbreak, we developed an R package, nCov2019 (https://github.com/GuangchuangYu/nCov2019). Besides detailed real-time statistics, it offers access to three data sources with detailed daily statistics from December 1, 2019, for 43 countries and more than 500 Chinese cities. We also developed a web app (http://www.bcloud.org/e/) with interactive plots and simple time-series forecasts. These analytics tools could be useful in informing the public and studying how this and similar viruses spread in populous countries.

As demonstrated in Suppl. Doc. 1, this new package also contains functionalities to facilitate data visualization. For example, with one command, users can easily plot the distribution of cases on the maps of the world, China, and even individual provinces ( Figure  1). With historical data, we can incorporate temporal and spatial information to create an animation to help us understand disease transmission and examine the spread of the COVID-19 outbreak.
To enable users to access these datasets without coding, we also developed interactive web apps in both English [9] and Chinese [10]. As demonstrated in Supp. Doc. 1, these apps can also be run locally from Rstudio. Using these apps, users can gain insights by quickly generating all 23 plots in Supp. Doc. 2 based on daily updated data. Complementing the dashboard by Dong et al. [3], our web app enables users to select their regions of interest and check both the historical and real-time data. Generated by the app on February 25, 2020, Figure 2 shows that the total confirmed cases in the provinces outside Hubei are stabilizing, following a similar trend. The extreme measures that the Chinese government took since January 23 seem to be working.
Built with the RStudio Shiny framework, these apps contain a simple forecast module. We first converted the log-transformed numbers of cases or deaths as a time-series data, then used the exponential smoothing method (ets) in the R package forecast [11] with default settings to forecast the total cases. On February 7, 2020, this simple model predicted that the death toll would reach 2000 in ten days, a staggering number at the time that later materialized, unfortunately. We also converted the raw number of cases as percent daily changes and conducted a similar forecast. Interestingly, daily percent changes in both confirmed cases and deaths in China are decreasing linearly except for a few outliers (see Figure 16 and 18 in Supplementary Document 2).
Even though not all data sources are official statistics, this kind of detailed data offers a unique opportunity to study this novel pathogen. The hundreds of cities could even be considered as semi-independent outbreaks, as many of them are far from the epicenter and effectively on lockdown from the end of January 2020. As shown in Figures 5 and 6 in Supp. Doc. 2, the death rate, estimated by dividing current total deaths by total confirmed cases, in Wuhan is 4.47%. Probably due to an overwhelmed healthcare system, this death rate is higher than the average of 2.92% (95% confidence interval [2.35% -3.38%]) observed in 22 Chinese cities with 200 or more confirmed cases. Cities in Hubei province have higher fatality rates than cities in other regions (Figure 6 in Supp. Doc. 2). Internationally, the death rate in Japan (2.50%) is close to that of Italy (2.60%), lower than the 3.67% observed in China overall (Figure 17 in Supp. Doc. 2). The death rate in Iran is 9.63%, probably due to underreported cases.
The rapid, exponential growth phase in China spans roughly from January 15 to February 15, 2020, when the number of confirmed cases skyrocketed 1670-fold from 41 to 68,500. Such rapid growth is now evident in South Korea, Italy, and Iran ( Figure 3). Other countries with a smaller number of cases but showing a sharp upward trend include Germany, Spain, and France. If not managed well, tens of thousands of cases in each of these and other countries could be possible in weeks. Public health officials need to grasp the power of exponential growth.
Currently, city-level historical data is only available for China. These data sources occasionally change data formats, which requires us to monitoring the data sources. If the Supplementary Document 1: Detailed tutorial and example of how to use the R package. Supplementary Document 2: Example of plots obtained from our web app.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not peer-reviewed)
The copyright holder for this preprint . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.