Modeling the Risk of Renal Impairment using Current Status Chronic Kidney Disease Data: A Simulation- based Analysis

Chronic kidney disease (CKD) is an emerging public health concern of India. We investigate hazards of renal impairment along with various risk factors under survival modeling setup, using a simulated data prepared based on reported prevalences of CKD by a number of population-based cross-sectional surveys of CKD prevalence in India. Since cross-sectional surveys only record, instead of exact time, the current status of kidney function at the time of examination, we encounter current status survival data. The survival analysis of our simulated data has provided evidence of increased risk of developing CKD for female, diabetic and hypertensive patients.


INTRODUCTION
Chronic Kidney Disease (CKD) is a progressive illness which is silently treading its way into the trajectories of major public health concerns of India. CKD is associated with increasing risk of cardiovascular morbidity, all-cause mortality and other complications of reduced renal function [1,2]. It has a prolonged asymptotic period, followed by the progression to end stage renal disease (ESRD) or kidney failure. Substantial loss of kidney function might occur before clinical symptoms become apparent, if not detected and managed timely. The 'Indian CKD Registry' reported that about 70% of the patients presented late for treatment [3]. Patients diagnosed with ESRD require dialysis or transplantation for survival, which eventually results in increased competing risk of cardiovascular morbidities and mortality; and decline in quality of life. However patients with CKD are more likely to die, mainly from cardiovascular complications, than to develop ESRD [1,4]. Progress to kidney failure or other adverse outcomes could be prevented or delayed through early detection and treatment [5,6]. Therefore the time elapsed before severe damage to kidney is a failed opportunity to prevent or delay the progression to ESRD.
Thus assessment of risk of developing chronic renal impairment would contribute in great measure towards patient care and management. Here, we have attempted a survival analysis to estimate hazards of developing renal impairment and identify its risk factors. Survival modeling usually requires the onset time of the event of interest, which may be gathered through a longitudinal study of patients with individual follow-up. However, our effort is constrained by the lack of maintained registry data with regular follow-ups in hospitals in India. So far in India, most of the studies have been conducted to conjure up prevalence and other epidemiological parameters of CKD in India. These studies are either population-based observational, cross-sectional studies [7][8][9][10][11][12] or hospital-based [13][14][15][16].
Therefore we attempt to estimate the risk of developing renal impairment using survival modeling within the constraint of the reported cross-sectional survey data in India. We have generated simulated CKD data using the prevalence estimates reported in these population-based studies. Survival analyses have been performed on this simulated data to report an estimate of risk of renal impairment and its risk factors. We begin with a description of the simulated data in Section-2. Section-3 illustrates the survival analyses techniques. Results of the survival analyses on simulated CKD data are presented in Section-4 followed by discussions in Section-5.
Our event of interest is the renal impairment, which is defined as CKD stage-3 onwards or low eGFR (eGFR ≤ 60 ml/ min/1.73m 2 ). We generated a hypothetical data set, comprising of 1500 cases, simulated on the basis of information reported in some of the studies. Reported prevalences (in %) of low GFR in India are-0. 86, 1.39, 0.79, 13.3, 3.02 and 17.4 [7-12]. We calculated the average of these estimates to derive an empirical estimate of prevalence of low GFR in adult population at 6%.
In our simulated population, every subject is examined once to determine the status of renal function at the time of survey. Since renal impairment is a chronic non-fatal condition, the onset time of damage occurrence cannot be observed exactly; only the current status of renal function is known at the time of examination. So age (in years) of a subject at examination was considered as the observation time (say, O). We generated the observation times from a truncated Normal (54, 12.73) within the interval (20, 85). Time to renal impairment (say T) was generated from an exponential distribution on the interval (20, 100) with an appropriate parameter so that rate of the event onset is 6%. If T≤O, time to renal impairment is left censored; else the observation is right censored. Since information on Sex, Diabetes Mellitus (DM) and Hypertension (HTN) were only available; we generated these covariates only through Bernoulli distribution. We generated several data sets sequentially. The finally selected data set was the one where descriptive characteristics were closer to the reported studies.
An individual with renal damage might have developed the condition (realized the event) before the examination, and the individual with the initial kidney impairment might develop severe dysfunction afterwards. Hence the collected data reflected the current status of the renal impairment of a subject rather than the exact time. The data collected from this simulation study can be categorized primarily as Current Status or Case-1 Interval Censored data[18].

RESEARCH PAPER
3. STATISTICAL ANALYSIS OF CURRENT STATUS SURVIV-AL DATA Suppose Y i be the time to renal impairment, T i the examination (censoring) time, independent of Y i , and Z i a p x 1 vector of covariates for i th subject. Denote δ i =1 when i th subject showed renal impairment during examination (Y i ≤ T i ) and δ i =0 otherwise. The observed data are {( t i , δ i , Z i ), i=1,2,…,n}. Denote the survival function and cumulative hazards function (CHF) of ith subject by S(.;Z i ) and L(.;Z i ) respectively.

Nonparametric Maximum Likelihood Estimates (NPMLE) of Surviving Renal Impairment
The probability of developing (failure distribution) or surviving (survival distribution) renal impairment was estimated using nonparametric maximum likelihood method for current status data. Overall survival curve was compared with Kaplan-Meier (KM) plot taking age at the time of examination as the exact failure time. Also survival curves were compared between different levels of various risk factors considered in our study. We used the Iterative Convex Minorant (ICM) algorithm for computing NPMLE of current status data because of its efficient convergence for NPMLEs [19][20][21]. Steps of this algorithm are 1. Order the examination times: t (1) ≤ t (2) ≤…. ≤ t (n) and relabel δ i accordingly as δ (1), δ (2) ,…,δ (n) . 2. Plot (i, ), i=1,2,…,n. 3. Form the greatest convex minorant G* of the points in step (ii). 4. Then NPMLE of F n (t (i) ) is the left-derivative of G* at i, which can further be expressed as . Then the estimated survival probabilities were compared between levels of covariates using a log-rank type test [18]. Any p-value less than 0.05 were considered significant.

Semiparametric Cox Proportional Hazards (PH) Regression
Under PH model, the hazards of renal impairment by the examination time, t, given covariate Z is Now (2) can be expressed as ( ) The log-likelihood can be written as The estimates of β and L 0 can be derived using maximum likelihood approach, described by Huang (1996)  with respect to β using Newton-Raphson method. Set k = k+1, and let β (k) be the maximizer. Repeat the steps until convergence.
Cox PH regression model, as described above, was applied to assess the effect of various covariates on the hazard of developing renal impairment. Goodness-of-fit of our model was assessed by plotting Cox-Snell residuals against cumulative hazard functions [23]. An approximately straight line would indicate a good fit. The proportionality assumption in our PH model was assessed by plotting log estimated cumulative hazard for each group of a covariate together against time; and approximately parallel plots would indicate if our PH model is correct [18].

RESULTS
Average age of the study population was 45 years (male-47 years, female-40 years). The prevalence of DM and Hypertension were found to be 16%, and 20.4% respectively.
Plots of survival functions of renal impairment estimated by NPMLE methods for all subjects and for various covariate groups were shown in Figure 1. The log-rank type statistics were significant for all the three covariates. Fig-1a depicted a comparison between NPMLE and Kaplan-Meier (KM) estimates of cumulative survival function. It showed clearly that KM overestimated probability of surviving kidney dysfunction. Fig-1b indicated a fair drop in survival probabilities for females after 40 years of age compare with their male counterparts (p-value <0.005). Individuals with DM (Fig-1d) and HTN (Fig-1c) showed significantly higher progression towards renal impairment (p-value <0.005).
Table-1 reported the results of Cox PH regression model. Only the final model with significant covariates had been presented. Similar to NPMLEs, females are twice as likely to experience renal impairment as their male counterparts (HR=2.15; 95% CI: 1.83-2.52). A diabetic person was at significantly higher risk of developing kidney dysfunction (HR=1.8; 95% CI: 1.21-2.10) and observation conjured in case of a hypertensive was akin to that of diabetics (HR=2.07; 95% CI: 1.51-2.21).  Our model was a good fit to the observed sample as evident from the straight line plot of Cox-Snell residuals against cumulative hazards (Fig-2). Also the key assumption of proportionality of hazards at different covariate levels had been validated in our model as apparent from the Fig-3.

DISCUSSION
We have tried to provide a novel way of exploiting available databases and to initiate a platform for further in-depth survival research in CKD. Our survival analysis of simulated data has provided evidence of increased risk of developing chronic renal impairment within various risk groups. Renal impairment in our study is defined as eGFR of a person less than 60 ml/min/1.73m 3 . Our study is first to study the risk of kidney impairment in India under a survival analysis set-up. Therefore direct comparisons of our findings with other studies, based on Indian subcontinent, might not be appropriate from methodological perspectives. However empirical comparisons can always lead to some important implications.
Our study revealed that females were twice as much likely to develop kidney impairment than male. Implications of our findings are stark realities that are lurking behind the changing landscape of India's socio-economic and behavioral paradigms. The magnitude of CKD will be of grave concerns in the years to come with the steadily increase in two of its risk populations. International Diabetes Federation (IDF) estimates that India's diabetic population will rise up to 70 million by 2025; and the prevalence of hypertension increased steeply from 6.2% in 1959 to 30.9% in 2005 [26]. However implication of our results should be carefully observed. Our study population has been generated through simulation using information reported in various population-based observational surveys. To conjure up a comprehensive picture of epidemiological progression of CKD in India, a follow-up study of CKD patients coming to nephrologists in hospital set-up would be more informative compared to our study, in terms of evaluating various clinical risk factors of CKD.
With a growing number of ESRD patients and in the absence of any health policy support from the government for treatment, CKD is gradually transforming to a major public health crisis in India which will eventually put a huge economic and disease burden on individual as well as on health planning. Interventions targeted at various risk groups would initiate early interventions to prevent progression to ESRD or eventually dialysis. Inclusion of CKD in the target groups of future government health programs would contribute greatly towards achieving this goal.