The Data set for Patient Information Based Algorithm to Predict Mortality Cause by COVID-19

The data of COVID-19 disease in China and then in South Korea were collected daily from several different official websites. The collected data included 33 death cases in Wuhan city of Hubei province during early outbreak as well as confirmed cases and death toll in some specific regions, which were chosen as representatives from the perspective of the coronavirus outbreak in China. Data were copied and pasted onto Excel spreadsheets to perform data analysis. A new methodology, Patient Information Based Algorithm (PIBA) [1], has been adapted to process the data and used to estimate the death rate of COVID-19 in real-time. Assumption is that the number of days from inpatients to death fall into a pattern of normal distribution and the scores in normal distribution can be obtained by observing 33 death cases and analysing the data [2]. We selected 5 scores in normal distribution of these durations as lagging days, which will be used in the following estimation of death rate. We calculated each death rate on accumulative confirmed cases with each lagging day from the current data and then weighted every death rate with its corresponding possibility to obtain the total death rate on each day. While the trendline of these death rate curves meet the curve of current ratio between accumulative death cases and confirmed cases at some points in the near future, we considered that these intersections are within the range of real death rates. Six tables were presented to illustrate the PIBA method using data from China and South Korea. One figure on estimated rate of infection and patients in serious condition and retrospective estimation of initially occurring time of CORID-19 based on PIBA.


a b s t r a c t
The data of COVID-19 disease in China and then in South Korea were collected daily from several different official websites.The collected data included 33 death cases in Wuhan city of Hubei province during early outbreak as well as confirmed cases and death toll in some specific regions, which were chosen as representatives from the perspective of the coronavirus outbreak in China.Data were copied and pasted onto Excel spreadsheets to perform data analysis.A new methodology, Patient Information Based Algorithm (PIBA) [1] , has been adapted to process the data and used to estimate the death rate of COVID-19 in real-time.Assumption is that the number of days from inpatients to death fall into a pattern of normal distribution and the scores in normal distribution can be obtained by observing 33 death cases and analysing the data [2] .We selected 5 scores in normal distribution of these durations as lagging days, which will be used in the following estimation of death rate.We calculated each death rate on accumulative confirmed cases with each lagging day from the current data and then weighted every death rate with its corresponding possibility to obtain the total death rate on each day.While the trendline of these death rate curves meet the curve of current ratio between accumulative death cases and confirmed cases at some points in the near future, we considered that these intersections are within the range of real death rates.Six tables were presented to illustrate the PIBA method using data from China and South Korea.

Value of the data
• These data provide the scientific community with a new methodology to estimate the death rate and then predict the death cases during an epidemic.• Scientific researchers, CDC employees, government officers for disease control and management, and public population, will benefit from these data.• These data will be very useful for the studies with the purpose either of disease control management or of related sources preparation to combat against an outbreak.• Due to the limited amount of data samples collected in this article, some factors, such as the phases of an outbreak and the measurements issued by the department of disease control that might impact the death rate of an epidemic, could be taken into for further insights and development of experiments with a large amount of data.

Data description
The data of 33 death cases in Table 1 have been collected from the official website of the Health Commission of Hubei Province in China, which include the date that patients have onset of symptoms, the date that patients began to be taken into ICU and the date of decease.With these data, the days both from symptoms appearance to death and from ICU intake to death can be calculated.Following normal distribution, the mean score μ and standard deviation σ can be calculated either.Thus the 5 selected scores ( μ, μ ± σ and μ ± 2 σ ) in normal distribution can be obtain as the basic elements for the following estimation and prediction of death rate, which are respectively 2, 8, 13, 19, 25 days.
The disease information in Table 2 has been collected from the public media before we resume data analysing with the same method of death rate estimation and prediction in South Korea as in China [1] .We have collected accumulative confirmed cases and deaths and then new confirmed cases and new deaths in South Korea.
According to the analysis result from Table 1 , we have selected 5 scores ( μ, μ ± σ and μ ± 2 σ ) in normal distribution which are respectively 2, 8, 13, 19, 25 days.When we calculate the death rate by dividing death cases with confirmed cases, these confirmed cases should be the ones on the 2 nd , 8 th , 13 th , 19 th and 25 th day before the day of prediction.
Death rate 1 is calculated by dividing new death cases with new confirmed cases.Death rate 2 is calculated by dividing accumulative death cases with accumulative confirmed cases.When the death rate came out with a negative value or no value, that means the new confirmed cases might be wrong for some reason or there's no new cases on several days before.We corrected a negative death rate or no value into zero (in red), and then added the new death case to the one of the next days (in green).
Each score we selected in normal distribution has a specific possibility when we take them into consideration of representatives in bell curve [1] .When we weighted each death rate on a day with their corresponding possibilities and then sum, the total death rate on each day can be obtained.Each curve consisting of several death rate will have a trendline and thus a formula to describe this trend as well as the current ratio between accumulative death cases and confirmed cases on each day ( Table 4 ).
The current ratio between accumulative death cases and confirmed cases is calculated by dividing accumulative death cases with accumulative confirmed cases on each day.
The trendlines of death rate 1 and death rate 2 tend to intersect with the trendline of the current ratio finally, because the current ratio will be the real death rate at the end of epidemic.We considered that the intersection value of three trendline (death rate1 and 2, current ratio) will drop in the range of real death rate.When we calculated the death rate separately with the corresponding formula of their trendlines, two intersections have been acquired ( Table 5 -B).We pick the maximum value between them to predict new death cases in the following days ( Table 6 ).
This table listed the number of deaths from March 16, 2020 to March 22, 2020 based on lagging days of 8, 13, and 19 days.The upper parts are predicted number of deaths based on

Experimental design, materials, and methods
Tables are produced based on the Patient Information Based Algorithm (PIBA) [1] .PIBA has been adapted when estimating the death rate of COVID-19 in Real-time with publicly posted data.Following normal distribution, the different durations with different possibilities between symptom appearance and death have been derived from analysing 33 death cases in Wuhan city of Hubei province in China [2] .Based on these results, the total death rate in regions can be calculated specifically by putting in the different death rates with different durations together.
While the trendline of these death rate curves meet the curve of current ratio between accumulative death cases and confirmed cases at some points in the near future, we considered that these intersections are within the range of real death rates.The data analysis was all following normal distribution, either in calculating the possibility of every selected score or in estimating the death rate.After collection of data of COVID patients from South Korea, the data was analysed with PIBA method as indicated above ( Table 2 ).The death rate was first estimated ( Table 3 ).The death rate then was calculated ( Table 4 ).Following estimations, the PIBA method then was used to predict the number of deaths in the following week ( Table 5 ).The predicated death numbers then were compared to the real death numbers ( Table 6 ).Fig. 1 is produced based on the following procedure.Up to February 25, 2020, the total accumulated number of infected patients in China is 78,064 (data only from mainland China).The number of new cases per day has not increased in the past 9 days.The total accumulated number of people who were in close contact with an infected person is 647,406.Thus, by simply dividing the number of infected persons by the number of contacted persons, the total infection rate is only 12%, considerably lower than expected.Prior expectation has been much higher, based on multiple infectious routes [3,4] .Using our formula, the results indicate that the current infectious rate is even lower than the rate based on the total numbers (see Fig. 1 A).The infectious rate in Hubei province is currently around 4%, although previously the rate was as high as 39%.On average, the infectious rate overall in China is about 4%, while in Hubei it is 10%.In the rest of the country, it is 0.46%.Among the inpatients, the rate in serious medical condition ranges from 10% to 30% (see Fig. 1 B), while it averages at 18% in China, 19% in Hubei, and 13% in the rest of country (except Hubei).Based on the estimated death rate, on January 22, there should be a total of 150 to 300 inpatients (see Fig. 5C).Based on the rate of patients who are severely ill among all patients, on January 2, there should be 216 to 315 patients.Based on the effective inf ection rate and based on the assumption of one week or 14 days from close contact to the onset of symptoms, there might be 2,160 to 68,478 people who were infected around December 20, 2019.If we believe the epidemic doubling time is approximately 6 days, the initial infection source may date back to as early as November or October 2019.

Fig. 1 .
13, and 19 days of the PIBA method.The lower part list the predicted minimum and maximum number of deaths, and actual reported deaths in each of the seven days.Rates of 2019-nCoV infection and rate of patients in serious medical condition.Total rate in China (blue color), Hubei (orange color) and rest of country (grey color).Numbers on the vertical axis indicate the percentage of infections.Numbers on the horizontal axis indicate the date.Fig.1A.The infection rates.Fig. 1 B. The rate of patients in serious medical condition.Fig. 1 C. Retrospective estimation of start time of disease based on PIBA and known information of patients in Wuhan.* Wr = Based on the rate of Wuhan; Rcr = Based on the rate from the rest of the country; Dt = doubling time [9]; Ir = infection rate; Sr = serious rate; Dr = death rate.

Fig. 1 .
Fig. 1.Estimated rate of infection and patients in serious condition and retrospective estimation of initially occurring time of CORID-19 based on PIBA data.

Table 1
33 death cases in Wuhan city of Hubei province in China.

Table 2
Disease information in South Korea.

Table 3
Death rate estimation in South Korea.

Table 4
Current ratio between accumulative death cases and confirmed cases.

Table 5
Death rate estimation in South Korea.

Table 6
Deaths prediction by PIBA and actual death data in South Korea.