Parametric Survival Models of Hemodialysis Patients in Relation with Patient-Related Factors

1 College of Science, Sudan University of Science and Technology 2 College of Computer Science and Mathematics, University of Bahri, Sudan 3 College of Medical Laboratory Science, Sudan University of Science and Technology 4 College of Social and Economic Studies, University of Bahri, Sudan 5 College of Petroleum Geology and Minerals, University of Bahri, Sudan Corresponding author: Reem Yousif MEKK, College of Science, Sudan University of Science and Technology. E-mail: reem11229@hotmail.com Abstract


INTRODUCTION
Survival analysis focuses on estimating the probability about individual who will hazard for a given length of time until death. Survival analysis is particularly useful when the probability of occurrence of the event under study changes with time [1][2][3] .
Th e fi nal stage of chronic kidney disease is End-stage Renal Disease (ESRD) and is characterized by pro-gressive permanent kidney failure. Dialysis therapy is a procedure aimed at eliminating the body's excrement and toxic substances and compensating for the loss of function of the kidneys. One dialysis class is hemodialysis 4 . It has been estimated that more than 1.1 million patients worldwide are estimated to have ESRD, with an addition of 7 percent annually. For example, incidence and prevalence levels in the United States are projected to increase by 44% and 85%, respectively, tients with hemodialysis who have stayed for a brief period of time and those in emergency conditions have been removed.

DATA ANALYSIS
Th e descriptive statistical analysis, percentage, and frequency were measured using Microsoft Excel software. In addition to the variables considered in this analysis, qualitative variables like (sex, marital status , education, occupation, daily dialysis, weekly frequency dialysis, hospital, diabetes mellitus and hypertension , diabetes mellitus, polycystic kidney disease, renal obstructions, shrunken kidney, unknown and other) and quantitative variables were classifi ed into eff ective variables.
Th e log-rank test is a statistical test used to compare the survival distributions of two or more groups used to test the hypothesis where there is no diff erence between the categories for and variable. It does not provide any estimation of the actual size of the eff ect; in other words, it provides a statistical, but not a clinical, assessment of the eff ect of the factor.
In this analysis, the quantitative variables were not distributed as usually calculated by the Kolmogorov-Smirnov test when the probability value was higher than the signifi cance level of 0.05, so the parametric method was used.
Th e form of Survival Analysis has been applied in this study was heavily relied on both a Univariate where there is only one explanatory variable required and Multivariate where at least two explanatory variables 7 of patients with chronic kidney disease diagnosed with ESRD under hemodialysis care as they explained in Tables 5 and 6.

Parametric models
A parametric survival model time is supposed to follow a certain distribution, which its probability density function can be represented by unknown parameters. Weibull, Exponential, Gompertz, log logistic, lognormal and gamma distributions are widely used 1,8,9 . Parametric proportional hazard (PH) Models Cox (1972) introduced the parametric (PH) model it's also known as the Cox regression model. Th e widely used models are Exponential, Weibull and Gompertz distribution 2,10,11 . from 2000 to 2015, and incidence and prevalence rates per million population by 32% and 70%. Th e progress of ESRD patients in developing countries has similar trends 5 .
Sudan is one of the countries where the chronic kidney failure is alarming. Th e frequency reported rate of (ESRD) new cases in Sudan is 70-140 per million people annually. Th e data available about the root a cause of rental disease that leads to chronic rental disease is very limited 5,6 .
Scientifi c studies have uncovered major causes of end stage renal disease in survival time. Th ese causes are aff ecting in the survival of hemodialysis patients for a live long time. Millions of people are being aff ected with outbreak of kidney disease around the world.
Th is study is to compare the performance of diff erent parametric models of the survival of hemodialysis patients. Parametric models were selected to estimate the survival probabilities. Th e application of these models helps to identify the prognostic factors that resulted in increasing the probability of survival.

MATERIALS AND METHODS
Th is study consisted of 325 hemodialysis patients who referred to public hospitals named Ahmed Gasim, Ibn Sina, Omdurman, Selma center, Bahri, and Ribat in Khartoum State during the period of time (December 2005 to December 2010) and then they were followedup till 2015. Data captured age, date of diagnosis of the disease, survival status even in the case of death or alive per months, sex, marital status, education level and occupation.

DATA COLLECTION
Khartoum State composed of 3 biggest cities named Khartoum, Bahri and Omdurman. Data of the study were collected from the biggest and well-known public hospitals in these three cities. Total number of patients covered in our study was 325.

INCLUSION CRITERIA
In the period from December 2005 to December 2015, all hemodialysis patients referred to the 6 public hospitals were included and all age ranges were included.

EXCLUSION CRITERIA
Patients with hemodialysis that have been diagnosed with acute renal failure, inadequate medical history, pa-Th e shape and scale parameters are therefore called gamma and , under the Weibull PH model, the hazard function of a specifi c patient is provided by the hazard function of a specifi c patient.

Gompertz Distribution
Th e Gompertz model has found application in demography and the biological sciences. In the particular case where =0, the hazard function has a constant value. Th e hazard function increases with time, decreases with time 2,11 .
Th is shows that linear int. is the log-hazard function. Monotonically, the Gompertz risk increases or decreases. Th e survival function is And the corresponding density function is Under the Gompertz PH model, the hazard function of a particular patient is given by

Selection Criterion
One of these criteria is the information criterion of Akaike (AIC), the Baysian Information Criterion (BIC) and the Cox-Snell Information Criterion (CSIC), the latter of which is a graphic rather than a mathematical criterion, many of the criteria used to choose the best model from diff erent models deal with the same data for prediction in the future.

AIC:
Comparisons may also be made on the basis of statistics between a variety of potential models which do not necessarily need to be nested 2,12-14 .

Exponential Distribution
Th e simplest and most important distribution in survival studies is the exponential distribution. It is often referred to as a purely random failure pattern.
Th e hazard function is Th e corresponding survivorship function is and so the implied Th e probability density function Survivorship function is A high  value shows high risk and limited survival; a low  value shows low risk and long survival. Th e distribution is also referred to as the unit exponential when  = 1 2,3,11 .
Under the exponential PH model, the hazard function of a particular patient with covariates x 1 , x 2 , x 3 ... x p is given by

Weibull Distribution
Proposed by Weibull (1939) and its applicability to diff erent cases of failure, again discussed by Weibull (1951). In several studies of reliability and mortality from human diseases, it was then used 2,3,11 .

Amore general form of hazard function is such that Th e survivor function is
Th e corresponding probability density function is then Our fi ndings showed that a total of 325 patients with hemodialysis were enrolled in this study. Th e demographic characteristics of the targeted patients showed that 59.7 % were male, 40.3 %, were female in terms of sex. By December 2015, 52.3 % of patients had died and 47.7 % were still alive, according to survival status. Th e marital status of the patients showed that 2.5% were divorced, 71.4% were married, 24% were single and 2.2% were widowed. Education revealed that 7.7 % of patients were illiterate, 32.6 % received basic education, 4 % were intermediate, 39.1 % completed secondary education and 16 % graduated. Patients' occupation wise shows that 18.8 % were employees, 13.8 % were freelancers, 41.2 % were unemployed, 3.7 % were police offi cers, 4.3 % were retired 7.4 % were students, 11.8 % were professionals.
In regard to the qualitative variables such as age; the minimum age was 6years. Th e maximum age was 88years. Th e fi rst quartile was 46.03years. Th e median age was 45years. While third quartile was 75years.Th e results breakdowns were as presented in Table 1.
Results of clinical characteristics showed that 88.9 % of patients with hemodialysis were normal and 11.1 % were sporadic patients with hemodialysis, 27.4 % were diabetic mellitus and 72.6 % were not diabetic mellitus. 29.5 % had hypertension and 70.5 % had no hypertension. 89.8% had both diabetes mellitus and hypertension, and 10.2% had neither diabetes mellitus nor hypertension. 3.4 % had shrunken kidneys and 96.6 % had no shrunken kidneys. Dialysis frequency per week found that two times (8.8%) and three times (81.2%) had polycystic kidney disease and 94.8% had no polycystic kidney disease. 8.0 % had renal obstruction and 92.0 % had no renal obstruction. 9.5 % were uncertain and 90.5 % were uncertain. 5.8 % had each other, and 94.2 % had no other.
Based on the log rank test, the variables considered to be important in Table 1 and Table 2 with p-value > 0.05 were entered in the mean parametric model, while other variables were not signifi cantly excluded from the parametric model. Th e variables used in the parametric model were normal, dialysis frequency per week, hospitals, diabetes mellitus, hypertension, diabetes mellitus and hypertension, shrunken kidneys, other.
Th e median overall survival time was estimated at 84 months and the trust level was found at 95% (61-89) as shown in Figure 1, which clarifi ed the overall survival curve of hemodialysis patients.
For univariate analysis, an additive Weibull and compertz model are found with similar meaningful Where P is the number of parameters, and K is the number of (excluding constant) coeffi cients in the model. For P=1, for P=2, for Weibull and Gompertz, for the exponential. Th e smaller the value of this statistic, the better the model, the better this statistic is known as Akaike's knowledge criterion.
In the distribution, where P is the number of parameters, K is the number of coeffi cients and n is the number of observations. As the best-fi t model, the distribution that has the lowest BIC value is considered a metric to test the goodness-of -fi t of a regression model for proportional hazards. As a descriptive statistics for goodness of fi t, Hosmer and Lemeshow propose the following 17,18 .
Where the log likelihood for the fi tted model with p covariates is LP, and L 0 is the log likelihood for model zero, the model without covariates is the log probability.

Ethical Considerations
Th e study protocol was authorised by the ethics and research committees of the Ministry of Health of Khartoum (serial number: KMOH-REC-1-2020). Hospitals received informed consent.  Table 3, 4, and 5).
Th e variables that match the univariate parametric model are shown in Table 7. We also found that the Gompertz model is the safest one to use in the future for forecasting. It is selected as it has the lowest AIC and BIC value.
For these three models, Figures 2, 3 and 4 displays the Cox-Snell residuals, the cumulative hazard function of Cox-Snell residuals (vertical axis) against the Cox-Snell residuals (horizontal axis) calculated below in Map. Th e fi tness of the survival model is more fi tting for the short deviation of residuals from the straight line through the origin with a slope of 1. Th en, based on criteria (AIC, BIC) and residual Cox-Snell, the Gompertz model is the better model compared to another parametric model. fi ndings for all Wald test variables (P-value < 0.05). Based on hazard ratio factors; age, diabetes mellitus, diabetes mellitus + hypertension, increased risk of death and other variables were observed; regular, hospital, hypertension, kidney shrunk dialysis frequency per week, and other, respectively, variables were found to have substantially higher survival rates. (see Table 3, 4, and 5).
Based on a multivariate analysis, it was assessed that risk factors, including age, hospital, dialysis frequen-

DISCUSSION
Th is research compared various parametric (PH) models to determine the best model for assessing and analyzing the risk factors aff ecting patients with hemodialysis survival in public hospitals in Khartoum State.
In this analysis, we closely tracked the medical history of the targeted patients in the hospitals through the duration before the occurrence of a signifi cant event such as death or living.
In the analysis of survival results, the focus is always on the probability or risk of death at any time after the initial period. One of the reasons for modeling data on survival is to decide which combinations of possible explanatory variables especially aff ect the type of the hazard function, the care that causes the risk of death     Table 7. Scores of Akaike Information Criterion (AIC) and Baysian Information Criterion (BIC) forUnivariate Parametric Models zard function is for individuals to achieve an approximation of the hazard function itself 2 . Th e quantitative variables were not distributed as usual in this research, as indicated in the data analysis, so the parametric models were used. In the log-rank test, the variables were important and were incorporated into parametric models. To estimate variables, univariate and multivariate tests were used.
Th e univariate analysis study for three models (Exponential, Weibull, and Gompertz) showed that all variables were important eff ects. We also found that age, diabetes mellitus, both diabetes mellitus and hypertension, increased the risk of death in patients (shorter survival) so that they could infl uence survival in the univariate model of this study. Other variables (regular, hospital, hypertension, shrunk kidneys, dialysis frequency per week, others) have decreased the risk of death (longer survival) and have a direct eff ect on the   riables (daily dialysis, hospital, age, dialysis Frequency per week) must therefore be included in the model. Th e fi nal multivariate Gompertz PH model was then defined.
According to HR, the variables including age, diabetes mellitus, diabetes mellitus +hypertension, were considered to be highly signifi cant factors in hemodialysis patients in the three models used in the research in particular multivariate analysis. Whereas other factors, such as regular in dialysis, hypertension, shrunken kidneys, dialysis frequency per week, and other, had substantially lower survival rates.
Th is research has its drawbacks, that is, the incompleteness of the majority of patient records and the lack of data that make it diffi cult to determine the real cause of the outbreak of the disease in Sudan. It is due to the fact that certain variables were not included in this analysis because they were not included in the patient medical record.

CONCLUSION
Gompertz distribution model is being the best for hemodialysis patient's analysis. Some variables such as (age, daily dialysis, hospital, dialysis frequency per week) were signifi cant factors. Th e study clarifi ed that some variables like regular in dialysis were signifi cant factor.
Compliance with ethics requirements: Th e authors declare no confl ict of interest regarding this article. Th e authors declare that all the procedures and experiments of this study respect the ethical standards in the Helsinki Declaration of 1975, as revised in 2008(5), as well as the national law. Informed consent was obtained from all the patients included in the study.
survival of the hemodialysis patient.
Multivariate analysis showed that many variables were insignifi cance. From the results based on Criteria (AIC, BIC) and the highest R 2 . Multivariate analysis found that many variables were negligible. From the results based on Parameters (AIC, BIC) and the highest R 2 in addition to Cox-Snell residual, we have found that Gompertz is the best model. It is therefore the most effi cient fi t model among other parametric models for patient hemodialysis data in addition to Cox-Snell residual; we found that the Gompertz is the best model. Similar fi ndings were found in previous studies 20 , indicating that age, and diabetes mellitus were important variables and hypertension, and the frequency of dialysis per week was negligible. Reference to (Exponential, Weibull and Gompertz) univariate and multivariate statistical analysis in hemodialysis patients showed that Weibull was chosen as the most eff ective model. Th e univariate fi ndings were important in this analysis, but they were negligible in the multivariate. In another study conducted by 21, era, diabetes mellitus, hypertension, were negligible except for clinical signifi cance. It was shown that the Weibull model was the best fi t among the parametric models of hemodialysis patients. In this analysis, only the clinic and hospital variable was important in the Exponential model.
In view of the Gompertz multivariate model, which was a major value of the Wald test for various variables, like daily dialysis (-pop=-0.514,HR = e^-pop=0.598,P=0.025) hospital (-pop=-0.191,HR = e^-pop=0.826,P=.001), age (-pop=0.015,HR = e^-pop=1.015,P=0.009), dialysis Frequency per week(-pop=-0.359, HR=e^-pop=0.698, P=0.040), when the (P<0.05) was achieved. Th at said, the approximate coeffi cient and risk ratio were also important and had a clear impact on the survival of hemodialysis patients in this study.
In the light of the above study, the model centered on these variables was important among the other models that were excluded from this research. Th ese va-