Approximation of Glomerular Filtration Rate after 1 Year Using Annual Medical Examination Data

Background: This cohort study was conducted to devise an approximation formula for predicting the glomerular filtration rate (GFR) after 1 year using annual medical examination data from the general population. Methods: Consecutive annual medical examination data were obtained for 41,337 inhabitants. Machine learning with the random forest method was used to assess the importance of each clinical parameter in terms of its association with estimated GFR (eGFR) after 1 year. An approximation formula was developed by multiple linear regression analysis based on the four most important clinical parameters. The relationship between the GFR after 1 year approximated by our formula and the eGFR after 1 year was analyzed using Pearson’s correlation coefficient. Results: The following approximation formula was obtained by multiple linear regression analysis: approximate GFR after 1 year (mL/min/1.73 m2) = −0.054 × age + 0.162 × hemoglobin − 0.085 × uric acid + 0.849 × eGFR + 11.5. The approximate GFR after 1 year was significantly and strongly correlated with the eGFR at that time (r = 0.884; p < 0.001). Conclusions: An approximation formula including age, hemoglobin, uric acid, and eGFR may be useful for predicting GFR after 1 year among members of the general population.


Introduction
Chronic kidney disease (CKD) is one of the most common diseases worldwide; its incidence and prevalence continue to increase each year [1].CKD progression is associated with increased risks of mortality and cardiovascular events [2,3].Therefore, early identification of individuals with a high risk of renal function decline is important for early interventions to prevent CKD progression.

Ethical Approval
This study was approved by the ethics committee of the Omiya Medical Association on 1 April 2016 (approval number: 2016001) and performed in accordance with ethical principles outlined in the Declaration of Helsinki.The ethics board waived the requirement for informed consent because of the retrospective nature of the study.Therefore, information regarding this study was displayed on notice boards in relevant institutions to inform all participants of their right to opt out.

Study Participants
We collected annual medical examination data of residents in Saitama City, Japan, from 2011 to 2019.All examinations were performed at primary care clinics and community hospitals in Saitama City.The inclusion criteria were age ≥20 years and participation in an annual medical examination for ≥2 consecutive years from 2011.The exclusion criteria were hemodialysis, peritoneal dialysis, and renal transplantation.

Study Design
This investigation was a retrospective population-based cohort study that utilized annual medical examination data for residents in Saitama City, Japan.Consecutive annual medical examination data from 2011 to 2019 were obtained by retrospective review of a medical database provided by Saitama City.The importance of each clinical parameter in terms of its association with the eGFR after 1 year was assessed by machine learning using the random forest method.An approximation formula for predicting GFR after 1 year was developed by multiple linear regression analysis based on the four most important clinical parameters.The relationship between the GFR after 1 year approximated by our formula and the eGFR after 1 year was analyzed using Pearson's correlation coefficient.Agreement between the approximate GFR after 1 year and the eGFR after 1 year was analyzed using Bland-Altman analysis.

Laboratory Methods
Blood and urine parameters were measured by commercial or hospital laboratories.eGFR was calculated using a modified version of the Modification of Diet in Renal Disease formula from the Japanese Society of Nephrology: eGFR (mL/min/1.73m 2 ) = 194 × age −0.287 × serum creatinine −1.094 (multiplied by 0.739 for women) [23].Blood pressure was measured with the participant in a sitting position at rest, using an automated upper arm cuff.Annual changes in clinical parameters were determined by subtracting the values from the values after 1 year.

Statistical Analyses
Data processing, machine learning implementation, and statistical analyses were performed using the KNIME Analytics Platform version 4.7.1 (KNIME, Zurich, Switzerland).We selected the clinical parameters that were available in the annual medical examination dataset of residents in Saitama City and were reported to be associated with renal function decline in previous reports.The final dataset included the following 14 variables: male sex, age, smoking, BMI, SBP, DBP, hemoglobin, uric acid, triglyceride, HDL-C, LDL-C, hemoglobin A1c, urinary protein, and eGFR; all variables were reportedly associated with renal function decline in previous studies (Table 1) [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22].Urinary protein was expressed in a semi-quantitative manner-grade 1, −; grade 2, ±; grade 3, 1+; grade 4, 2+; grade 5, ≥3+-because urinalysis was performed with a urine dipstick test in annual medical examination.Data were expressed as means ± standard deviations for continuous variables, and as counts and percentages for categorical variables.Data were normalized and divided into a training set and test set.We used 80% of the data for training and the remaining 20% to validate the results.Afterward, the importance of each parameter was assessed using a machine learning-based random forest method, sorted in descending order, and illustrated using a bar chart.Next, test data were applied to the prediction model and the accuracy of the model was evaluated (Figure 1).The first dataset included 14 independent variables: sex, age, smoking history, BMI, SBP, DBP, hemoglobin, uric acid, triglyceride, HDL-C, LDL-C, hemoglobin A1c, urinary protein, and eGFR.The second dataset included 14 independent variables: sex, age, annual change in smoking history, annual change in BMI, annual change in SBP, annual change in DBP, annual change in hemoglobin, annual change in uric acid, annual change in triglyceride, annual change in HDL-C, annual change in LDL-C, annual change in hemoglobin A1c, annual change in urinary protein, and eGFR.Annual changes in sex, age, and eGFR were not used as independent variables because sex cannot be changed, the annual change in age was 1 year for all participants, and annual change in eGFR was the dependent variable.Based on a previous report that showed that four-variable and eight-variable equations are similar with respect to predictive accuracy [24], the four most important parameters in terms of associations with eGFR after 1 year, according to the random forest method, were included in multiple linear regression analysis.Data were divided into a training set and test set.We used 80% of the data for training and the remaining 20% to validate the results.Next, an approximation formula was established by multiple linear regression analysis using the forced entry method.After test data were applied to the approximation formula, missing values were excluded and the accuracy of the formula was evaluated (Figure 2).We performed two sets of multiple linear regression analyses: one focused on clinical parameters and the other focused on annual changes in clinical parameters.Correlation between the approximate GFR after 1 year and the eGFR after 1 year was evaluated using Pearson's correlation coefficient.Agreement between the approximate GFR after 1 year and the eGFR after 1 year was assessed using Bland-Altman analysis.p values < 0.05 were considered statistically significant.
after 1 year was evaluated using Pearson's correlation coefficient.Agreement between the approximate GFR after 1 year and the eGFR after 1 year was assessed using Bland-Altman analysis.p values < 0.05 were considered statistically significant.Data were divided into a training set and test set (Step 1).Next, an approximation formula was established by multiple linear regression analysis using the forced entry method (Step 2).After test data were applied to the approximation formula (Step 3), missing values were excluded (Step 4) and the accuracy of the formula was evaluated (Step 5).

Participant Characteristics
The baseline characteristics of the study participants are summarized in Table 2.In total, 349,050 records were obtained for 41,337 participants (16,918 men, 24,419 women; mean age: 64.0 ± 6.9 years; BMI: 22.8 ± 3.2 kg/m 2 ).Four thousand five hundred and sixtyone participants (11.0%) had a past or current history of smoking.The mean SBP and DBP were 128.2 ± 16.2 mmHg and 76.3 ± 10.6 mmHg, respectively.The mean hemoglobin level after 1 year was evaluated using Pearson's correlation coefficient.Agreement between the approximate GFR after 1 year and the eGFR after 1 year was assessed using Bland-Altman analysis.p values < 0.05 were considered statistically significant.Data were divided into a training set and test set (Step 1).Next, an approximation formula was established by multiple linear regression analysis using the forced entry method (Step 2).After test data were applied to the approximation formula (Step 3), missing values were excluded (Step 4) and the accuracy of the formula was evaluated (Step 5).

Participant Characteristics
The baseline characteristics of the study participants are summarized in Table 2.In total, 349,050 records were obtained for 41,337 participants (16,918 men, 24,419 women; mean age: 64.0 ± 6.9 years; BMI: 22.8 ± 3.2 kg/m 2 ).Four thousand five hundred and sixtyone participants (11.0%) had a past or current history of smoking.The mean SBP and DBP were 128.2 ± 16.2 mmHg and 76.3 ± 10.6 mmHg, respectively.The mean hemoglobin level Data were divided into a training set and test set (Step 1).Next, an approximation formula was established by multiple linear regression analysis using the forced entry method (Step 2).After test data were applied to the approximation formula (Step 3), missing values were excluded (Step 4) and the accuracy of the formula was evaluated (Step 5).

Importance of Clinical Parameters in Terms of Associations with eGFR after 1 Year
Figure 4 shows the importance of each parameter in terms of its association with the eGFR after 1 year.We performed multiple linear regression analysis using the four most important parameters according to the random forest method.This analysis revealed that age (coefficient (β) = −0.054,p < 0.001), hemoglobin (β = 0.162, p < 0.001), uric acid (β = −0.085,p < 0.001), and eGFR (β = 0.849, p < 0.001) were independently correlated with the eGFR after 1 year (Table 3 (

Importance of Clinical Parameters in Terms of Associations with eGFR after 1 Year
Figure 4 shows the importance of each parameter in terms of its association with the eGFR after 1 year.We performed multiple linear regression analysis using the four most important parameters according to the random forest method.This analysis revealed that age (coefficient (β) = −0.054,p < 0.001), hemoglobin (β = 0.162, p < 0.001), uric acid (β = −0.085,p < 0.001), and eGFR (β = 0.849, p < 0.001) were independently correlated with the eGFR after 1 year (Table 3 (  We also performed multiple linear regression analysis using the four most important annual changes in parameters according to the random forest method.This analysis revealed that age (β = −0.050,p < 0.001), annual change in hemoglobin (β = −0.398,p < 0.001), annual change in uric acid (β = −3.205,p < 0.001), and eGFR (β = 0.864, p < 0.001) were independently correlated with the eGFR after 1 year (Table 3 (B)).We also performed multiple linear regression analysis using the four most important annual changes in parameters according to the random forest method.This analysis revealed that age (β = −0.050,p < 0.001), annual change in hemoglobin (β = −0.398,p < 0.001), annual change in uric acid (β = −3.205,p < 0.001), and eGFR (β = 0.864, p < 0.001) were independently correlated with the eGFR after 1 year (Table 3 (B)).

Formula for Approximation of GFR after 1 Year
An approximation formula was developed using variables that showed a significant correlation with the eGFR after 1 year according to multiple linear regression analysis.We developed two formulas: one focused on clinical parameters and the other focused on annual changes in clinical parameters.

Correlation between the Approximate GFR after 1 Year and the eGFR after 1 Year
The approximate GFR after 1 year calculated by our formula with clinical parameters (Formula ( 1)) was significantly and strongly correlated with the eGFR at that time (r = 0.884; p < 0.001).The approximate GFR after 1 year calculated by our formula with annual changes in clinical parameters (Formula ( 2)) was also significantly and strongly correlated with the eGFR at that time (r = 0.894; p < 0.001).

Agreement between the Approximate GFR after 1 Year and the eGFR after 1 Year
Bland-Altman analysis showed a moderate agreement between approximate GFR after 1 year calculated by our formula with clinical parameters and eGFR at that time.In total, 97.7% of the points were included within the mean difference ± 1.96 standard deviation (0.0 ± 12.8) (Figure 5).This analysis also showed a moderate agreement between approximate GFR after 1 year calculated by our formula with annual changes in clinical parameters and eGFR at that time.In total, 97.6% of the points were included within the mean difference ± 1.96 standard deviation (0.0 ± 12.1).

Intercept
11.9 <0.001 Coefficients represent the changes in the dependent variable per unit changes in independent variables.

Formula for Approximation of GFR after 1 Year
An approximation formula was developed using variables that showed a significant correlation with the eGFR after 1 year according to multiple linear regression analysis.We developed two formulas: one focused on clinical parameters and the other focused on annual changes in clinical parameters.

Correlation between the Approximate GFR after 1 Year and the eGFR after 1 Year
The approximate GFR after 1 year calculated by our formula with clinical parameters (Formula (1)) was significantly and strongly correlated with the eGFR at that time (r = 0.884; p < 0.001).The approximate GFR after 1 year calculated by our formula with annual changes in clinical parameters (Formula (2)) was also significantly and strongly correlated with the eGFR at that time (r = 0.894; p < 0.001).

Agreement between the Approximate GFR after 1 Year and the eGFR after 1 Year
Bland-Altman analysis showed a moderate agreement between approximate GFR after 1 year calculated by our formula with clinical parameters and eGFR at that time.In total, 97.7% of the points were included within the mean difference ± 1.96 standard deviation (0.0 ± 12.8) (Figure 5).This analysis also showed a moderate agreement between approximate GFR after 1 year calculated by our formula with annual changes in clinical parameters and eGFR at that time.In total, 97.6% of the points were included within the mean difference ± 1.96 standard deviation (0.0 ± 12.1).

Discussion
In the present study, we found that age, hemoglobin, uric acid, and eGFR were associated with the eGFR after 1 year among members of the general population in Japan.We also found that age, annual change in hemoglobin, annual change in uric acid, and eGFR were associated with the eGFR after 1 year in this population.These findings enabled us to develop an approximation formula for predicting GFR after 1 year; the approximate GFR calculated by this formula was strongly correlated with the eGFR after 1 year.However, as shown in Figure 5, variation between approximate GFR after 1 year and eGFR after 1 year became greater as eGFR after 1 year became higher.It has been reported that variation between GFR measured by inulin clearance and eGFR calculated by the Modification of Diet in Renal Disease formula became greater as the GFR became higher [23].These findings suggest that the development of an approximation formula for predicting the GFR after 1 year might be challenging among members of the general population.
A higher serum uric acid concentration is reportedly associated with a more rapid decline in creatinine clearance among patients with CKD G1-4 [25].An observational study showed that an increased serum uric acid concentration was associated with a higher incidence of end-stage kidney disease among patients with CKD G3-4 [26].Several observational studies involving members of the general population showed that a change in the serum uric acid concentration was negatively associated with a change in the eGFR [15,27].In the present study, the eGFR after 1 year was negatively associated with the serum uric acid concentration and the annual change in serum uric acid concentration among members of the general population.These results suggest that the serum uric acid concentration is negatively associated with changes in renal function in the population both with and without CKD.
There is evidence that the eGFR declines more rapidly as the hemoglobin concentration decreases among patients with CKD G2-5 [13].In an observational study, a decreased hemoglobin concentration was associated with a higher incidence of end-stage kidney disease among patients with CKD G4-5 [28].Several observational studies of healthy individuals showed that a lower hemoglobin concentration was associated with more rapid eGFR decline [12,29].In the present study, the hemoglobin concentration was positively associated with the eGFR after 1 year among members of the general population.These findings suggest that the hemoglobin concentration is positively associated with changes in renal function in the population both with and without CKD.However, the present study also showed that the annual change in hemoglobin concentration was negatively associated with the eGFR after 1 year.Mild renal dysfunction is suspected to enhance renal erythropoietin production in early-stage CKD [30].A longitudinal cohort study revealed that eGFR decline was associated with an increase in hemoglobin concentration among individuals with normal or mildly decreased renal function [31].In the present study, the change in hemoglobin concentration was inversely correlated with the eGFR after 1 year among members of the general population with preserved renal function, consistent with the previous findings [31].Further studies are needed to elucidate the pathogenesis involved in the inverse relationship between the change in renal function and the change in hemoglobin concentration among members of the general population.
In living kidney donors, the GFR declines with increasing age [32].An observational study involving the general population indicated that eGFR decreased as age increased [22].In the present study, age was negatively associated with the eGFR after 1 year among members of the general population.These findings suggest that age is negatively associated with changes in renal function in the population both with and without CKD.
In the general population, there is evidence that eGFR decline accelerates as eGFR decreases [33].In the present study, eGFR was positively associated with the eGFR after 1 year among members of the general population.These results suggest that current renal function data are essential for predicting subsequent changes in renal function among members of the general population.
A large population-based cohort study involving 120,727 individuals in Japan reported that the prevalences of CKD G3 and G4-5 were 21.35% and 0.01%, respectively [33].In the present study, the prevalences of CKD G3, G4, and G5 were 14.6%, 0.1%, and 0.0%, respectively, which was similar to the result of previous study [33].
This study had two main advantages.First, it was a large-scale cohort study involving approximately 40,000 individuals in the general population, and it analyzed clinical parameters associated with the eGFR after 1 year.We confirmed that age, hemoglobin, uric acid, and eGFR are associated with a future change in eGFR, as reported in a population with CKD.The results of our study can facilitate further research to identify clinical factors associated with future change in eGFR among members of the general population.Second, to our knowledge, this is the first study to develop an approximation formula for predicting GFR after 1 year using clinical parameters associated with the eGFR after 1 year.The approximation formula derived from age, hemoglobin, uric acid, and eGFR may be useful in the prediction of GFR after 1 year among members of the general population.
This study also had some limitations.First, its retrospective design might have led to some reporting and selection biases.Second, all participants were residents of Japan, which might reduce the generalizability of the findings.Additional prospective studies involving multiethnic populations are needed to confirm our findings.Third, we used a two-point method to calculate the annual changes in clinical parameters including hemoglobin and uric acid, which may have introduced larger variability because it does not contain any information between the two points [34].Fourth, we used the random forest method as a machine learning algorithm because it is superior with respect to high accuracy, good performance with many variables, and resistance to a large amount of learning data [35].However, the conclusion can hardly be stronger than the stringency of the data which entered the machine learning algorithm.Fifth, in the present study, we developed an approximation formula for predicting GFR after 1 year using eGFR after 1 year calculated by the Modification of Diet in Renal Disease formula as a reference standard.However, it has been shown that variation between GFR measured by inulin clearance and eGFR calculated by the Modification of Diet in Renal Disease formula was greater in the population without CKD [23].The possibility remains that our formula may not accurately approximate GFR after 1 year among members of the general population.Sixth, in the present study, we did not analyze the clinical parameters associated with the annual change in eGFR.Therefore, further studies are required to assess the accuracy of the approximation formula derived from clinical parameters associated with the annual change in eGFR, in comparison with that of the approximation formula derived from clinical parameters associated with the eGFR after 1 year.
In conclusion, age, hemoglobin, uric acid, and eGFR were associated with the eGFR after 1 year among members of the general population in Japan.An approximation formula including age, hemoglobin, uric acid, and eGFR may be useful for predicting GFR after 1 year in the general population.However, there was a non-negligible discrepancy between the approximate GFR after 1 year and the eGFR after 1 year.This study provides the foundation for additional research to develop an approximation formula for predicting GFR in the general population.

Figure 1 .
Figure 1.KNIME workflow for random forest method.Data were normalized (Step 1) and divided into a training set and test set (Step 2).Afterward, the importance of each parameter was assessed using the random forest method (Step 3), sorted in descending order (Step 4), and illustrated using a bar chart (Step 5).Next, test data were applied to the prediction model (Step 6) and the accuracy of the model was evaluated (Step 7).

Figure 1 .
Figure 1.KNIME workflow for random forest method.Data were normalized (Step 1) and divided into a training set and test set (Step 2).Afterward, the importance of each parameter was assessed using the random forest method (Step 3), sorted in descending order (Step 4), and illustrated using a bar chart (Step 5).Next, test data were applied to the prediction model (Step 6) and the accuracy of the model was evaluated (Step 7).

Figure 2 .
Figure 2. KNIME workflow for multiple linear regression analysis.Data were normalized (Step 1) and divided into a training set and test set (Step 2).Afterward, the importance of each parameter was assessed using the random forest method (Step 3), sorted in descending order (Step 4), and illustrated using a bar chart (Step 5).Next, test data were applied to the prediction model (Step 6) and the accuracy of the model was evaluated (Step 7).Data were divided into a training set and test set (Step 1).Next, an approximation formula was established by multiple linear regression analysis using the forced entry method (Step 2).After test data were applied to the approximation formula (Step 3), missing values were excluded (Step 4) and the accuracy of the formula was evaluated (Step 5).

Figure 5 .
Figure 5. Bland-Altman plot comparing approximate GFR after 1 year calculated by our formula with clinical parameters and estimated GFR after 1 year.Abbreviations: eGFR, estimated glomerular filtration rate; GFR, glomerular filtration rate; SD: standard deviation.

Table 1 .
Studies that investigated the factors associated with renal function decline.

Table 3 .
Multiple linear regression analysis of variables correlated with the eGFR after 1 year.

Table 3 .
Multiple linear regression analysis of variables correlated with the eGFR after 1 year. (A

Using annual changes in clinical parameters as independent variables
Coefficients represent the changes in the dependent variable per unit changes in independent variables.