Modeling and analysis of recovery time for the COVID-19 patients: a Bayesian approach

Abstract The ongoing pandemic of COVID-19 has changed every aspect of life. Most of the people who become a victim of COVID-19 experience mild to moderate symptoms, but some people may become seriously ill. This illness, sometimes, may lead to a very painful death. The Fréchet distribution is one of the flexible distribution for survival time. Hence, in this article, the recovery time of COVID-19 patients is modeled by a new Fréchet-exponential (FE) distribution, and the parameters of the distribution are estimated in the classical and Bayesian paradigms. Since the Bayes estimators using informative priors are not in the closed form, the Lindley and Tierney–Kadane approximation methods are used for their evaluation. The results obtained through simulation studies and the COVID-19 data set assess the superiority of the Bayes estimators over the classical estimators in terms of minimum risks. Mathematically and graphically, it is shown that our proposed model appropriately fits the data set. The minimum values of Akaike information criterion, Bayesian information criterion, corrected Akaike information criterion, and Hannan-Quinn information criterion proves that the FE distribution better fit than the competitors’ distribution for the data set about the recovery time of COVID-19 patients.


Introduction
In 2019, as mentioned by Zhu et al. (2020), a novel deadly virus originated in Wuhan, China, and started spreading initially in the state and then in the whole world.The virus was named as coronavirus, COVID-19 and SARC-CoV-2, i.e. severe acute respiratory syndrome coronavirus-2.In 2020, the virus got spread all over the world.Lutchmansingh et al. (2021) mentioned that COVID-19 has infected a huge population around the world.It has infected people of all ages, but elderly people or people with certain acute diseases are at higher risk of getting infected.The spread rate of COVID-19 is higher as it is transmitted from person to person by droplets released by the infected person's cough or sneeze.
Medical reports have demonstrated that the initial symptoms of the virus include coughing, high fever, fatigue, sore throat, body aches, breathlessness, and severe weakness.If the patient is not treated on time, then the condition may get worsened to severe respiratory disorders, multi-organ dysfunctioning, and even death.Infected people are mostly found to be symptomatic, but certain asymptomatic patients have also been reported (WHO, 2020a).The asymptomatic COVID-19 patient acts as a carrier from infected to healthy humans (Rothe et al., 2020).The asymptomatic patients have a lower viral load in their nasal cavity than symptomatic patients (WHO, 2020b;Zou et al., 2020) has advised that the only way to stay safe from this deadly coronavirus is to keep a distance from everyone, wear surgical masks, sanitize your hands, and avoid touching public surfaces as the virus may remain viable for days, over any surface if environmental conditions are optimum for it.However, different disinfectants have been found effective against the virus, including hydrogen peroxide and hypochlorite as mentioned by Kampf, Todt, Pfaender, and Steinmann (2020).
Studies like the one conducted by Jordan, Adab, and Cheng (2020) to estimate the COVID-19 cases reported in the hospitals have shown that the death rate of COVID-19 patients is higher among the elderly population and those with weaker immune systems.As it is a new phenomenon, many studies have been carried out recently not only to study the properties of coronavirus but also to study the rate of transfer of virus, the survival time and length of stay of the covid patients, etc. Nemati, Ansary, and Nemati (2020) studied the discharge time of COVID-19 patients, and using a survival model, predict the length of stay of patients in a hospital.Levy et al. (2020) predicted the survival of hospitalized COVID-19 patients through clinical characteristics and mortality risk factors.(Car, Baressi Segota, Andeli c, Lorencin, & Mrzljak, 2020) modeled the spread of COVID-19 disease using machine learning solutions that are multilayer perceptron and artificial neural network techniques.Niazkar, Eryılmaz T€ urkkan, Niazkar, and T€ urkkan (2020) applied the three predictive models to forecast the outbreak of COVID-19 in Iran and Turkey.Zuo, Khosa, Ahmad, and Almaspoor (2020) proposed a new flexible extended-X (NFE-X) family of distribution, and one of the members of this family, NFE-Weibull distribution, was used to model the total deaths of COVID-19 patients belonging to Asian countries.Tang and Wang (2020) developed a mathematical modeling approach (5 days moving average) for a daily growth rate of COVID-19 cases in the United States.Nesteruk (2020) predicted the number of Coronavirus victims using mathematical modeling.Din, Li, Khan, and Zaman (2020) proposed a model that gives better constraints on understanding the climaxes of coronavirus spread.Almongy, Almetwally, Aljohani, Alghamdi, and Hafez (2021) analyzed the mortality rate of COVID-19 patients using extended odd Weibull Rayleigh (EOWR) distribution.Haj Ismail, Dawi, Jwaid, Mahmoud, and AbdelKader (2021) studied the spread of COVID-19 in the United Arabs Emirates.AlSayegh and Iqbal (2021) investigated the effect of precautionary and preventive measures on the limited spread of coronavirus in Bahrain.Bakar et al. (2021) developed the lock-down model in ASEAN countries by using hybrid ARIMA-SVR and hybrid SEIR-ANN.
The Fr echet distribution proposed by Fr echet (1927) is one of the important extreme value distributions which has found wide applications in natural phenomena like earthquakes, floods, sea waves, rainfall, etc.However, for survival analysis, it is one of the flexible distributions.Krishna, Jose, Alice, and Risti c (2013) introduced a new Marshall-Olkin Fr echet distribution and after studying its statistical properties, applied it to a real-life data set about the survival times of pigs.Shafiq et al. (2021) purposed a modified Kies-Fr echet distribution to examine and model the COVID-19 mortality rates in Canada and Netherland.Abbas et al. (2019) used the three-parameter Fr echet distribution for the analysis of survival times of a group of patients suffering from head and neck cancer.Kutal and Qian (2018) studied a long-term survival non-mixture model of the Fr echet distribution and proved that it is the best for modeling the data on allogeneic marrow transplantation patients.Yousof, Altun, and Hamedani (2018) derived the odd log-logistic Fr echet distribution and the heart transplant data set is used for its application.Korkmaz, Yousof, and Ali (2017) proposed the odd Lindley-Fr echet distribution and used it to model the real-life data set of the survival time (in weeks) of acute Myelogenous Leukaemia patients.
The Bayesian statistics update the person's belief in the evidence of new data.The Bayesian workflow is based on three steps; the first one is the prior distribution which provides the prior knowledge about parameters.The second is the likelihood function, which is the information about the parameters provided in the available data and the third one is the posterior distribution of the parameters, which is formulated via the Bayes theorem by combining the prior distribution with the likelihood function.In the last few years, the Bayesian statistics has found its applications in nearly every field of life because of its novel nature.Many statisticians like Bherwani et al. (2021), De Oliveira et al. (2020), and Manevski et al. (2020), etc., have conducted the Bayesian study for the research and analysis of COVID-19 patients.
The main objective of this study is to model the recovery time of COVID-19 patients using a suitable statistical distribution and to propose an efficient estimation technique to attain estimators of the parameters of this distribution.The best estimators are selected so that worldwide health practitioners, researchers, and epidemiologists could use these estimators for the analysis and better prediction of the recovery time of COVID-19 patients.
As mentioned above, the Fr echet distribution has played an important role in survival analysis.This and many other life failure distributions have been successfully utilized to model and analyze the survival and reliability data.In this study, we have proposed a new Fr echet-exponential (FE) distribution using the Transformed-Transformation technique by Alzaatreh, Lee, and Famoye (2013) and modeled the recovery time of a sample of COVID-19 patients.This model is analyzed and estimated using maximum likelihood and the Bayesian estimation techniques.The expressions of the Bayes estimators of the parameters of the FE distribution are not attained in closed-form.Hence, the Lindley and Tierney-Kadane (T-K) approximation methods are used.For illustrative purposes, the data set of 117 COVID-19 patients is taken from the Nishtar hospital Multan, Pakistan, and the recovery time (duration from the time of a patient's admission in a hospital to the time of discharge) is recorded.It is shown mathematically and graphically that our proposed model adequately fits the data set.The performance of the Bayes estimators is observed to be better than the MLEs.The rest of the paper is organized as follows.
In Sec. 2, the FE distribution is derived.The survival, hazard rate, and cumulative hazard rate functions are presented here.In Sec. 3, the MLEs of the parameters of the distribution are estimated.In Sec. 4, the Lindley and T-K methods are utilized for the Bayes estimators using informative prior under five loss functions.Sections 5 and 6 are about the simulation study and real-life examples of the survival time of COVID-19 patients.Finally, the study is concluded in Sec. 7.

The Fr echet-exponential distribution
Because of the importance of Fr echet distribution in the survival and reliability analysis, a new and more flexible FE distribution has been suggested, in this study.This distribution is derived using one of the transformed-transformer techniques given by Alzaatreh et al. (2013) and is used to model and analyze the data consisting of the recovery times of COVID-19 patients.Alzaatreh et al. (2013) proposed the transformedtransformer (T-X family of distribution) technique, in which the PDF (/ðtÞÞ of a continuous rv T is transformed using the transformer W½W x ð Þ, which is a specific functional form of the CDF W x ð Þ of another rv X.This transformer is differentiable and monotonically non-decreasing function and its functional form depends upon the support of the rv T. Alzaatreh et al. (2013) also defined the T-X family for various transformers under different supports of rv T.
In our study, the rvs T and X follow Fr echet and exponential distributions, respectively.One of the transformers given by Alzaatreh et al. (2013) which is used to transform the Fr echet distribution.The support of the rv T is [0, 1Þ: The PDF and CDF of the transformed FE distribution are: where b and k are the shape and inverse shape parameters of the Fr echet and exponential distributions, respectively.The shape of the distribution depends upon the value of the parameters.Figure 1 shows the PDF and CDF plots of the distribution, and it shows that the distribution is unimodal and positively skewed.

Survival analysis
In survival analysis, we analyze the data that involves the time duration until an event occurs.The survival time is the duration of time from a predefined point to the occurrence of an event of interest.Similarly, the hazard function gives the rate of failure of a patient, component, or system given that before time t, the failure has not occurred.We have obtained the survival, hazard, and cumulative hazard functions of the FE distribution.In our case, the survival time will represent the recovery time (i.e. the duration of time between the COVID-19 patient's admission in the hospital and his discharge after recovery).Let a non-negative rv T representing survival time follow the FE distribution.Then the survival function SðtÞ, is the probability of surviving beyond the time t is given as: Hazard function hðtÞ which is the ratio of the probability density function and the survival function is given as: The expected failure time known as cumulative hazard function HðtÞ of FE distribution is: Figure 2 shows the survival, hazard, and cumulative hazard function plots for different values of parameters.The chance of survival decreases by increasing the time.The hazard rate is high for time between 5 and 15, after that time it seems to be constant.
To estimate the parameters of the distribution in frequentist and Bayesian paradigms, the method of maximum likelihood and Bayesian estimation techniques are used which are elaborated in the further sections.

Maximum likelihood estimators
Let X 1 , X 2 , X 3 , :::, X n be a random sample from the FE distribution.Then the likelihood and log-likelihood function of the distribution are; The maximum likelihood estimators (MLEs) k and b of the parameters of FE distribution are obtained by differentiating the log likelihood function with respect to its parameters, the two normal equations obtained cann't solve simultaneously.The iterative method is one of the solutions, hence Newton Raphson's iterative procedure is used for the estimation of MLEs.

Bayes estimators using informative prior
In Bayesian statistics, the personal belief about the unknown parameters of the model of interest is quantified in the form of a prior distribution.Then the sample  data and prior distribution are combined to form the posterior distribution.This posterior distribution contains all the probabilistic information about the parameters.The informative prior provide specific and valuable information about the parameters as compared to the deviance of data obtained from the experiment.
In this study, the appropriate prior distributions for the parameters k and b are taken to be exponential and Weibull distributions, respectively (depending upon the support of the parameters).Considering the independence of prior, the joint prior distribution of k and b is: where a, b, and c are the hyperparameters.The joint posterior distribution of the parameters of FE distribution is obtained by combining the likelihood function and the joint prior distribution given in Eqs.
(3) and ( 5), through the Bayes theorem, that is: The inferential procedures of the parameters k and b are based on the marginal distributions of the parameters.These marginal posterior distributions of the parameters are not in the closed-form expression.
The posterior risk is the expectation of loss function.The Bayes estimators are the minimizers of these risks.The Bayes estimators and associated posterior risks of the parameters k and b are evaluated under the square error loss function (SELF), weighted loss function (WLF), quadratic loss function (QLF), precautionary loss function (PLF), and modified II loss function (MIILF).The Bayes estimators and associated posterior risks for the functional forms of the parameters Uðk, bÞ are defined as: The expression of the Bayes estimators obtained from Eq. ( 8) is not in a closed-form.Hence two approximation techniques, Lindley and T-K are used to derive the Bayes estimators of the parameters of the FE distribution under the loss functions mentioned above.(1980) provided an asymptotic solution of the ratio of two integrals which gives a single approximate result.This technique is widely used for estimation in Bayesian analysis.Lavanya and Alexander (2016) used this technique to estimate the Constant Shape-Bi Weibull distribution based on failure time data.Sharma, Singh, and Singh (2017) estimated all the parameters of the power Lindley distribution through this technique.Adnan (2021) obtained the Bayes estimators of the scale and shape parameters of the Weibull-Lindley Rayleigh distribution using this method.Equation ( 7) can be written as:

Lindley
The functions l k, bjx ð Þ, qðb, kÞ are the log-likelihood function and log of a prior distribution, respectively.The expression (9) using Lindley approximation is evaluated as: where r ij is the ijth element of the var-covariance matrix of the parameters of FE distribution.Also; The Bayes estimators of b under SELF, WLF, QLF, PLF, and MIILF, using expression (10) are: The corresponding posterior risks are obtained to be: Similarly, the Bayes estimators and associated posterior risk of parameter k are evaluated.

Bayes estimators and posterior risks using T-K approximation
Tierney and Kadane (1986) applied the Laplace method for the approximate evaluation of the ratio of two integrals.This technique required the evaluation of the second derivative of the log-likelihood function, while the third derivative of the log-likelihood function is required for Lindley approximation.This technique is widely used in literature; Gencer andGencer (2019, 2020), Sana and Faizan (2019), and Tanis and Saracoglu (2019), etc., applied this technique in their studies.
For the FE distribution; The expression for the Bayes estimators and associated posterior risk provided in Eq. ( 9) can be expressed using T-K approximation, as: where kL Ã , bL Ã and kL , bL maximize the All the Bayes estimators and associated posterior risks of the parameters b and k under SELF, WLF, QLF, PLF, and MIILF are evaluated using expression (11).

Simulation study
In this section, the Monte Carlo Simulation scheme is used to study the performance of the FE distribution both in the classical and Bayesian paradigms.These classical and Bayesian estimators under different loss functions are compared for different sample sizes.For this purpose, the random samples of 30, 50, 100, 200, 300, 500, 700, 1000, and 1500 observations are drawn from the FE distribution through the random number generator 1 U is a uniform random variate over the interval (0,1).For the MLEs of the parameters of the distribution, the Newton-Raphson iterative process is used because the two normal equations obtained from the method of maximum likelihood cannot be solved simultaneous.The R package maxLik is used for this purpose.The Bayes estimators are evaluated using the theoretical results of the Lindley and T-K approximation methods given above.The informative priors (exponential prior for k and Weibull prior for b) under the SELF, WLF, QLF, PLF, and MIILF are used for the evaluation of Bayes estimators and associated posterior risks.The values of the parameters are taken as k ¼ 0:999 and b ¼ 0:6: The elicited values of the hyperparameters are a ¼ 1:5, b ¼ 3:08, and c ¼ 2:1: All the computation is done by making programming routines in the R language.The simulation size is taken to be 1000.The efficiency of the estimators is assessed on the basis of minimum values of the associated posterior risks.The results of the simulation study are shown in Tables 1 and 2.Here the bold font is used for the minimum risk values for each sample size.From Tables 1 and 2, it is evident that the performance of the Bayes estimators, for all the sample sizes is better than the MLEs as can be seen from the values of the loss functions.When the sample size increases, the values of risks decrease and approach zero.For the parameter b, the estimators obtained through the T-K approximations method are better than the Lindley method with minimum risks and SELF proves to be better than the other loss functions.For parameter k, the Bayes estimators under the Lindley method are better than the T-K method and WLF can be considered the best loss function as it gives the least posterior risks.

Analysis of recovery time for the COVID patients
In this section, the suitability of the FE model fitted to the data set of the recovery time of COVID-19 patients is studied.For this purpose, 117 patients are taken from the Nishtar hospital Multan, Pakistan.The data set consists of the recovery time of the COVID-19 patients admitted to the hospital and   discharged after recovery.Table 3 provides the summary statistics of the data set.
For the 117 patients, the mean (6 standard deviation) age of the patients is calculated to be 38 (615.913years), and the average duration of the patients' stay from the time of admission in a hospital to their time of discharge after recovery is reported to be 15.1795 (612.825days).
The empirical and cumulative distribution of the data set and the goodness of fit curve of the FE distribution is shown in Figure 3    shows that the hazard rate of COVID patients is very high in the early period after being admitted to the hospital, then level off and the hazard of death decreases over time.Hence, in the early period, the likelihood of failure is greater.Table 4 demonstrates the comparison of the flexibility of FE distribution using the data of COVID-19 patients with the following distributions: Voda (1972) with pdf, f ðx, hÞ ¼ 2h x 3 e Àh=x 2 , x, h > 0 Fr echet distribution by Fr echet (1927), with pdf, f ðx, hÞ ¼ hx ÀhÀ1 expðÀx Àh Þ, x,h > 0 Gompertz Fr echet by Oguntunde, Khaleel, Ahmed, and Okagbue (2019) with pdf, Weibull Fr echet by Afify, Yousof, Cordeiro, Ortega, and Nofal (2016) x, a, b, a, b > 0 We have compared the distributions on the basis of maximized log-likelihood, Akaike information criterion (AIC), Bayesian information criterion (BIC), corrected Akaike information criterion (CAIC), and Hannan-Quinn information criterion (HQIC).The distribution having smaller values of these criteria for the data set is considered to be better.
Figure 5 is about the goodness of the curve of the FE distribution with the competitive distributions for the data set.It is depicted that the FE model fits the data appropriately and turns out to be more flexible than its competitors.
For the data set of the survival times of the COVID-19 patients, the MLEs of the parameters k and b of the distribution are not in the closed-form, hence are evaluated using the Newton-Raphson method.On the other hand, the Bayes estimates of the parameters using informative prior (exponential prior for k and the Weibull prior for bÞ are evaluated using the Lindley and T-K approximation methods.These MLEs and Bayes estimates are compared on the basis of minimum values of associated risks.The results are shown in Table 5.The results of Table 5 show that the Bayes estimators of the parameters of the FE distribution perform better than the MLEs, as they have minimum values of posterior risks.For parameter k, the estimators under the T-K method using SELF prove to be better than other Bayes estimators.The estimators of b under the Lindley method using SELF are better with minimum risks for the given data set. Table 6 shows the mean and standard deviation (sd) of the data set and those obtained from FE distribution via MLEs and the Bayes estimators using Lindley and T-K methods.The mean and sd obtained from data set and from the FE distribution are approximately closer to each other, which is evident that the FE distribution is suitable for the data set.

Concluding remarks
In this study, a univariate statistical distribution is proposed to model the time to recover of a sample of 117 COVID-19 patients.The data set is taken from the Nishtar hospital Multan, Pakistan.The parameters are estimated in the classical and Bayesian paradigms.The average time duration of the patients from the time of their admission in the hospital to discharge after recovery is obtained to be 14 to 15 days.The Bayes estimators are evaluated with informative prior using the Lindley and T-K approximation methods.The results of the simulation study and real-life data set show that the Bayes estimators prove to be more efficient than the MLEs with minimum values of risks.The FE distribution turns out to be a better fit than the standard distribution for the given data set.It is hoped that health practitioners and medical researchers may use the proposed probability distribution to model different medical and health phenomena.In addition to this, Bayesian estimation has proved to be better for parameter estimation than classical methods, especially in medical and health sciences.This study can also be extended by using the censored data as well.

Figure 1 .
Figure 1.The PDF and CDF plots of the FE distribution for various values of b and k:

Figure 2 .
Figure 2. Survival, Hazard, and cumulative hazard functions plots of FE distribution.

Figure 3 .
Figure 3. (a) Empirical and cumulative distribution of the data set.(b) The goodness of fit curve of the FE distribution with the data set.
(a) and 3(b).It is evident that the data about the recovery time of COVID-19 patients adequately fit the FE distribution.In Figure 4, the reliability plots of the FE distribution for the COVID-19 patient's data set are displayed.The hazard plot provided in Figure 4(b),

Figure 4 .
Figure 4. Reliability plots of FE distribution for the COVID-19 data set.

Figure 5 .
Figure 5. Goodness of fit curve of the FE distribution with the competitive distributions.

Table 1 .
The MLEs and Bayes estimators of b with values of risks under defined loss functions.

Table 2 .
The MLEs and Bayes estimators of k with values of risks under defined loss functions.

Table 3 .
Summary statistics of recovery time (days) of the COVID-19 patients.

Table 4 .
MLE, log-likelihood, AIC, BIC, CAIC, and HQIC of the distributions based on data set.

Table 5 .
The MLEs and Bayes estimators with associated risks of k and b for COVID-19 data.

Table 6 .
The mean and standard deviation of the data and MLEs and Bayes estimates.