Biological age for chronic kidney disease patients using index model

The estimation of biological age (BA) is an important asymptomatic measure that can be used to understand the physical changes and the aging process of a living being. Factors that contribute towards profiling the human biological age can be diverse. Therefore, this study focuses on developing a BA model for patients with Chronic Kidney Disease (CKD). The procedure commences with the selection of significant biomarkers using a correlation test. Appropriate weighting is then assigned to each selected biomarker using the indexing method to produce a BA index. The BA index is matched to the age variation within the sample to acquire additional terms for the chronological age leading ultimately to the estimated BA. From a sample of 190 patients (133 trained data and 57 testing data) obtained from the University of Malaya Medical Centre (UMMC), Malaysia, the intensity of the BA is found to be between three to nine years from the chronological age. Visual observations further validate the high similarities between the training and testing data sets.


INTRODUCTION
The estimation of biological age (BA) is becoming increasingly popular as an asymptotic measure to understand changes in physical functionality as well as the ageing process of a living being. Over the chronological age (CA), which exhibits an exact figure signifying the period between birth and the present time of an individual, the BA is widely used to indicate the healthy and unhealthy ageing through variables that contribute to healthspan (Kim & Jazwinski, 2015). Furthermore, BA shows the health state of each individual and serves as a comparative measure between individuals of the same age and gender (Kang et al., 2017). Thus, it describes one lifetime behaviour informatively provided the common premise that physical functionality decline is parallel to deterioration of health condition and age increment.
Unhealthy individuals demonstrate only decreased function in CA, but functional BA is intended to represent different stages of ageing (Hayflick, 2007). Group of individuals that perceived ill-health exhibited higher biological ages compared to the healthier groups. In this respect, those with more significant functional biological age have a higher chance of death because they reached a more advanced ageing stage earlier than others. Thus, the traditional approach in measuring individual health according to CA is less appealing in today's highly dynamic and rapidly changing global lifestyle.
Researchers have recently employed several statistical techniques to develop the BA model, primarily by incorporating multiple relevant biomarkers into the model. The BA model does not only predict ageing-related diseases but also considers the functional status during ageing (Jia et al., 2016). Several articles have been published on the measurement of BA using statistical methods. Multiple linear regression (MLR) remains one of the most widely used methods for calculating BA (Bae et al., 2008;Cho, Park & Lim, 2010;Jee, 2019;Jee & Park, 2017;Jia, Zhang & Chen, 2017;Levine, 2013;Nakamura & Miyao, 2007;Park et al., 2009;Krøll & Saxtrup, 2000). Nevertheless, MLR has been criticized for the multicollinearity risk besides the potential for estimates to regress toward the mean (Cho, Park & Lim, 2010). These suggest the MLR equation underestimates the individual BA in the older age while overestimating the younger age (Park et al., 2009). Principal component analysis (PCA) was proposed to overcome the disadvantage of MLR in the development of the BA formula (Kang et al., 2017;Cho, Park & Lim, 2010;Jee, 2019;Jia, Zhang & Chen, 2017;Levine, 2013;Nakamura & Miyao, 2007;Park et al., 2009). However, the PCA cannot avoid some of the statistical deficiencies of MLR (Klemera & Doubal, 2006). An alternative to this, the Klemera & Doubal Method (KDM) provides better precision in estimating BA than MLR and PCA methods (Cho, Park & Lim, 2010;Jee, 2019;Jia, Zhang & Chen, 2017;Levine, 2013). Although KDM gives the most reliable estimates in BA prediction, it involves complex calculations (Cho, Park & Lim, 2010).
This study focuses on developing the BA model for patients with Chronic Kidney Disease (CKD). The public-health effect from mortality due to this disease has not been fully assessed (Wen et al., 2008). Furthermore, CA does not give a good reflection on the time-dependent changes in kidney function (Rowland et al., 2018). To better understand an individual degree of ageing or life span and how CKD influences an individual degree of ageing, a new approach needs to be developed.
In this study, we develop the BA using the indexing method. An index number is the most common statistical method to measure changes in a set of data points besides summarizing and ranking a particular data set. Moreover, measuring BA by examining the index number keeps track of the original representation of the data and thus ensures the output resembles the empirical structure closely. During this indexing process, each selected biomarker is given a unique treatment corresponding to its severity level. Visual observations are also presented to justify the appropriateness of the method used.

IRB/Ethics approval
The data used in this study was approved by The Medical Research Ethics Committee, University of Malaya Medical Centre (MREC ID NO 2018428-6258). The committee granted permission to carry out the study within its facilities with common terms including; to adhere the instruction, guidelines and requirement by the committee. The Patient Information Sheet and Consent Form were waived by the committee. This retrospective study used the data based on the earlier initiated and completed studies in a new outlook.

Measure of correlation
The strength between two variables can be measured using Pearson's correlation coefficient (Wackerly, Mendenhall & Scheaffer, 2014). A representative measure for this, the r-value, signifies both the magnitude and direction of the strength, that is, a closer value to AE1 indicate high strength in the positive or negative direction. The r-value can be computed as: where x and y are the value for the two variables and n is the total number of samples. This study considers 10 biomarker relationships with the CA; height, weight, gender, BMI, creatinine, e-GFR, PB Systolic, BP diastolic, CTCA calcium score and CKD stage from CKD patients. BA biomarkers that have an absolute r-value greater than 0.15 were selected for inclusion in BA calculation (Jee, 2019;Park et al., 2009).

Weighted average method
This study proposes a weighted average method to estimate the weight for each significant biomarker. Note that the weight ranges from 0 to 1. Higher weight signifies a higher association between the health biomarkers to the BA index. The weight for biomarker i is computed as follows: where r i is the correlation coefficient of the i th biomarker computed using Eq. (1) and n is the total number of significant biomarkers. Note that the sum for all w i 's equals to one, P n i¼1 w i ¼ 1.

Indexing method
This study uses the indexing method for BA calculation. The index produced from this method represents the amount of change with respect to the base value. For each biomarker, the base value is set to be the normal value or the favourable health condition. Thus, each biomarker has a unique indexing assignment based on the medical measurement they carry. In brief, where measured value is the current health reading of the patient while the normal value is the normal reading level for health biomarker i. Table 1 summarizes several common reading levels based on the standard clinical practice as well as work carried out in literature studies. The health biomarkers are categorized into several reading levels based on the severity of the postulated measurements. Index equations are then developed based on Table 1. Note that the normal reference value is taken as the mid-point of the normal reading level. The index equation for the BMI is given by: The index equation for the systolic blood pressure is given by: The index equation for the diastolic blood pressure is given by: The index equation for the eGFR is given by: The index equation for the CTCA is given by: Eqs. (4) to (8) formulate the framework for BA index calculation similar to medical practices for six biomarkers to detect the severity level. Figures 1A to 1E show the aforementioned state graphically. Note that the health biomarkers index ranges from 0 to 1 where 0 indicates a favourable level of the health biomarker, index value from 0 to 1 indicates deteriorating health condition, while index value of 1 indicates a critically ill stage. The medical measures for each of the biomarkers are unique. Therefore, the index value can be a useful comparative tool across these measurements.
In order to produce the biological index for each individual, each health biomarker index is multiplied by its corresponding weight. The weight proportionately signifies the contribution of each biomarker index to the BA index. Mathematically, the overall index for individual x is computed as follows: where w i and Index i;x are the weight and index for the i th health biomarker of individual x, respectively.

BA estimation
Several methods have been proposed to estimate the BA. Among these are the multiple linear regression (MLR) (Jia, Zhang & Chen, 2017) and principal component analysis (PCA) (Nakamura & Miyao, 2007). PCA method is derived from MLR and it reduces the effects of underestimated or overestimated BA (Jia, Zhang & Chen, 2017). Both methods show a linear relationship between BA and health parameters. BA index models developed for predicting BA in this study also follow a linear relationship for individual general health status. It is developed based on the mathematical settings of the index method. The method suggests combining all individual subcomponents indices in one principal component (in this case, all health biomarkers into a single BA index). The subcomponent index measures the changes for each representative group of the biomarkers from the CA. Note, however, that the value of the index is not in year term. A common approach translating it into meaningful year-unit is by equation BA x ¼ ðI BioAge for x Â standard deviationÞ þ mean of CA. Because the sample data focuses on the individual kidney patients, the following adjustment was made to the BA x : where BA x and CA x are the biological age and chronological age for individual x, respectively. The standard deviation SD is computed based on the chronological age of the sample.

Bland-Altman analysis
It is vital to observe the mean values to understand the nature of a prediction model. It follows that the degree of dispersion from the mean indicates the fitness of individual BA values (Jee & Park, 2017). Degrees of dispersion between BA and CA are commonly presented using the Bland-Altman plots. Bland-Altman plots the difference of CA and BA against the mean of the two measurements where it is easier to measure the magnitude, spot outlier, and see the data trend (Altman & Bland, 1983). If the differences are normally distributed, the mean differences should lie between d À 1:96s and d þ 1:96s (95% confidence interval) where d is the estimated mean difference and s is the standard deviation for the differences (Giavarina, 2015).

Data
The study population consisted of 190 patients subject to stage 1 to stage 4 CKD. Patients were recruited from the inpatient clinic, University of Malaya Medical Center (UMMC), Kuala Lumpur, Malaysia. Chronological age varied from 35 to 82 years, and the study population included both males (115 patients) and females (75 patients). The age range was chosen to ensure that the population was old enough to be experiencing age-related changes in biomarkers.
Physical measurements include gender, height, weight, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), CTCA Calcium Score (CTCA), creatinine, CKD stage and eGFR. It is observed that the CKD stage for the data involved was between 2 and 5. More specifically, 4% were in stage 2, 50% were in stage 3, 38% were in stage 4 and 8% were in stage 5. Table 2 shows the mean result, standard deviation (SD) and data range of CKD patients. For validation purposes of the BA model, 70% of the data were used as the training set (i.e., 133 data) while the remaining 30% were used for the testing set (i.e., 57 data).

Findings
The development of BA using the indexing method involved several sequences; correlation analysis, computation of weighted average for selected health biomarkers, construction of BA indices from the index equations and the estimation of BA based on sample variations. Biomarkers that show an absolute correlation coefficient value exceeding 0.15 were selected for BA estimation. An associated significance test that examined the correlation between the biomarkers and the chronological age for the training data set is summarized in Table 3. Seven biomarkers were found significant for inclusion in the BA estimation. It was observed that creatinine, weight, BMI, eGFR, BP systolic and BP diastolic decreased as age increased, while only the CTCA Calcium increased with age. Height, CKD stage and gender had a low correlation with age and were thus excluded from the BA estimation.
Note that some biomarker features had almost similar clinical implications and expressed high inter-correlation. It indicates the existence of redundancy. To cater for this redundancy, one biomarker with more substantial significance was selected. It is observed from Table 4 that weight and BMI had an absolute correlation coefficient of more than 0.15 (−0.234 and −0.173, respectively) and both showed high correlation with each other (0.787). Due to the marginal difference in their correlation with age and clinical significance of the BMI, it was selected for BA estimation. Furthermore, BMI was more reliable to indicate the individual weight, either normal, overweight or obese (North American Association for the Study of Obesity, 2000); a similar procedure was used for selection between the creatinine and eGFR. Both biomarkers represent the renal function. Creatinine and eGFR showed mild inter-correlation (0.641) in addition to the absolute correlation coefficient of more than 0.15 (−0.202 and −0.233, respectively). Therefore, eGFR was selected for the BA estimation. With respect to its significance, measurement of eGFR is the most reliable assessment of renal function in CKD (Bostom, Kronenberg & , 2002) where it is used as an index of renal function in clinical practice (Perrone, Madias & Levey, 1992). The weight for each parameter was derived based on the correlation analysis. The higher the correlation, the higher the weighted for the BA parameters. As shown in Table 5, BP Diastolic, CTCA and eGFR were the three highest contributors to the index value for predicting the BA.

Ritz
Accordingly, five biomarkers, including, BMI, CTCA calcium, eGFR, BP systolic and BP diastolic were selected to estimate the BA index. The weightage for each of the selected biomarkers was then computed using Eq. (2) to arrive at the following BA index: Table 6 summarizes the estimated BA based on the BA index of Eq. (11). It is evident that all estimated BAs were higher than their corresponding CAs for CKD patients. Note, however, that the increase in BA varied for patients with identical CKD stages, acknowledging other competing factors that compensate for the difference. It was observed that the gain in BA for the 133 training data set ranges from 3 to 9 years with a mean of 7 years.   Overall, the BA for patients suffering kidney disease increased by 5% to 16% from its CA. Figure 2 shows that about 65% of kidney patients in stage 3 and stage 4 increased 9% to 12% from their CA. It indicated that on average the CKD patients at these stages gain between 5 to 9 years from their CA biologically.
The Bland-Altman plot in Fig. 3 exhibits the differences in CA and BA against the mean with a 95% confidence interval. All plots are shown below the zero value because this study utilizes data for patients diagnosed with CKD. Note however, the plot does not indicate whether the limits are acceptable or not. The judgment is based on clinical necessity. It is observed that most of the plots lie between the 95% confidence interval (CI), that is, they are inside the limit of agreement (LOA). To explain further, the differences between CA and BA are plotted against the z-score of a standard normal distribution in Fig. 4. It was observed that the majority of the plots fall along the straight line, which suggests the difference between CA and BA to follow the normal distribution. Furthermore, the correlation was found to be 0.9715 and the bell test using kurtosis test (0.170) indicates the degree of tailedness in the frequency was close to perfect normal distribution. In addition, the skewness test gives a value of 0.395 where the test value was between −0.5 and 0.5 or nearly zero, which justifies the assumption of normal distribution.
To further validate the BA index model, the testing dataset with 57 CKD patients was examined. Figure 5 shows the proximity between BA for the testing dataset and BA for the training dataset. The BA of CKD patients increases as the CA increase due to the severity of the biomarkers measurements. On average, the age for patients diagnosed with CKD (stage 1 to stage 4) increases by 7 years from their CA (64 years old) to BA (71 years old). Nevertheless, we conceived the small size of the sample and its representation of the population as the limitation of the study.

CONCLUSIONS
Recent studies have shown the importance of assessing the functional biomarkers for predicting ageing-related diseases. Ageing-related disease influences the BA of a person. Several statistical approaches have been observed in constructing the BA model, namely, the MLR, the PCA and the KDM (Levine, 2013). Each method owns its deficiency in the measurement of BA. Therefore, the indexing method is proposed to address issues, especially with redundant biomarkers (through manual examination of redundant biomarkers and selection of the relevant biomarker by experience), over or underestimated BA for a particular age group (by concentration to a particular disease group), and complicated calculations (through a tractable computation for each biomarker).
The estimation of BA using the indexing method proposed in this study facilitates a tractable form for each health biomarker. The initially calculated BA index represents the illness severity level for the CKD patients which contributes to proportionate gain in the BA. The level of severity is categorized into high risk, normal risk and at risk so that each patient may be assessed individually. Ten biomarkers were examined to see their appropriateness for inclusion in the BA estimation.
The results of this study show that patients with CKD between stage 2 to stage 5 experience gain in BA between 3 to 9 years. This finding may serve the medical practitioners a precaution for treatment in addition to current measurement facilities as it provides a comprehensive reading of the manifold biomarkers. The result is further validated with trained data and visual observations. Notwithstanding, increasing the sample size for the study and inclusion of diverse population may enhance the reliability of the end result.
Besides its practicality in the medical field, BA can be potentially useful in many areas, including the insurance industry where age and health play a central role in the premium calculation. Mortality projection based on the BA can be an exciting exploration of human lifetime behaviour that incorporates health biomarkers. In addition, an interesting future work is to develop methods which enables a fair comparison between the various biological age approaches.
Noriszura Ismail performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft. Wan Ahmad Hafiz Wan Md Adnan performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Human Ethics
The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers): The Medical Research Ethics Committee, University of Malaya Medical Centre granted approval to carry out the study within its facilities (MREC ID NO 2018428-6258).

Data Availability
The following information was supplied regarding data availability: The raw data is available in the Supplemental File.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.13694#supplemental-information.