A novel differential diagnostic model based on multiple biological parameters for immunoglobulin A nephropathy

Background Immunoglobulin A nephropathy (IgAN) is the most common form of glomerulonephritis in China. An accurate diagnosis of IgAN is dependent on renal biopsies, and there is lack of non-invasive and practical classification methods for discriminating IgAN from other primary kidney diseases. The objective of this study was to develop a classification model for the auxiliary diagnosis of IgAN using multiparameter analysis with various biological parameters. Methods To establish an optimal classification model, 121 cases (58 IgAN vs. 63 non-IgAN) were recruited and statistically analyzed. The model was then validated in another 180 cases. Results Of the 57 biological parameters, there were 16 parameters that were significantly different (P < 0.05) between IgAN and non-IgAN. The combination of fibrinogen, serum immunoglobulin A level, and manifestation was found to be significant in predicting IgAN. The validation accuracies of the logistic regression and discriminant analysis models were 77.5 and 77.0%, respectively at a predictive probability cut-off of 0.5, and 81.1 and 79.9%, respectively, at a predictive probability cut-off of 0.40. When the predicted probability of the equation containing the combination of fibrinogen, serum IgA level, and manifestation was more than 0.59, a patient had at least an 85.0% probability of having IgAN. When the predicted probability was lower than 0.26, a patient had at least an 88.5% probability of having non-IgAN. The results of the net reclassification improvement certificated serum Immunoglobulin A and fibrinogen had classification power for discriminating IgAN from non-IgAN. Conclusions These models possess potential clinical applications in distinguishing IgAN from other primary kidney diseases.


Background
While some nephrologists may administer tentative drugs to patients with chronic kidney disease (CKD) based on the clinical manifestation prior to performing a renal biopsy, they still depend on a percutaneous renal biopsy to make a definite histological diagnosis, and thereby, determine an efficient drug administration strategy [1][2][3], especially for patients with resistance or unresponsiveness to immunosuppressive agents, anticoagulants, and/or angiotensin-converting enzyme inhibitor (ACEI) [4][5][6].
However, despite the advantages of being safe, simple, and easy, this invasive procedure is not risk-free [7]. Furthermore, based on our clinical experiences, at times, renal biopsies cannot be performed on certain patients due to contraindications [8,9], patient refusal, and insufficient operative skills of physicians at certain hospitals. Moreover, the pathologic diagnosis obtained from renal biopsies may be variable. It was previously reported that there was a common change in the histological patterns of lupus nephritis with repeated renal biopsies [10]. This may be due to disease progression, different surgeons performing the biopsy, different biopsy amounts and parts of tissue obtained, and discordant opinions from different nephropathologists [11,12]. Thus, patients would benefit if there was a non-invasive and practical classification model for discriminating the pathological subtypes of kidney disease.
It also has been previously reported that mathematical models may be used to classify different diseases or stages of diseases [13,14]. In fact, some classification equations are already being used in kidney disease. In our previous retrospective study, we reported that the combination of six serum indicators could discriminate immunoglobulin A nephropathy (IgAN) from non-immunoglobulin A nephropathy (non-IgAN) with an 82.3% sensitivity and a 68.6% specificity [15]. This classification method was found to be efficient in the auxiliary diagnosis of IgAN, which is still the most common form of glomerulonephritis in China [16].
In the present study, we utilized common statistical analyses (including logistic regression and discriminant analyses) and typical biological parameters to determine clinically practical classification equations for IgAN and non-IgAN.

Design
The present study was a retrospective cohort study, was conducted in accordance with the Declaration of Helsinki, and approved by the Medical Ethics Committee of the Chinese PLA General Hospital. Patient research consent form was presented as Additional File 1. Fasting blood samples were collected on the second day after patients were admitted into our hospital, according to the established inclusion criteria. Patients were then screened again, according to established exclusion criteria, and divided into two groups, one for establishing a classification model (after 2011), and the other for validating the classification model (before 2011).

Patients
The inclusion criteria were established to pre-screen all patients. The inclusion criteria were as follows: a) the patient was admitted into the Division of Nephrology at our hospital for the first time; b) a renal biopsy had not been previously performed on the patient for the exact pathologic diagnosis at our or any other hospital; c) the patient was not previously undergoing anti-coagulation, immunosuppression, and/or renal replacement therapy; d) the patient may present with either hepatitis, diabetes, hypertension, or lupus, but not with a tumor; and e) the patient approved to undergo a renal biopsy during the hospital admission. The exclusion criteria used for the final selection of cases were as follows: a) if for any reasons the renal biopsy was not preformed on the included patient (e.g. the patient refused a renal biopsy examination, the patient's condition worsened during the period of admission, the kidneys of the patient were atrophied or sclerotic.); b) the pathological results indicated that the patient has secondary kidney disease, including diabetic nephropathy, lupus nephritis, hepatitis-related nephropathy; and c) the pathological results could not ascertain whether the patient has primary nephropathy. Based on the exclusion criteria, 301 cases were selected. The immunofluorescence findings, exact histopathological diagnosis for non-IgAN, and Oxford classification score for IgAN of the 121 patients allocated into the 'modeling' group, which was used in establishing the classification model, are listed in Additional file 2.

Biological parameters and data grouping
Besides "manifestation", the other 56 biological parameters were listed in Table 1. Data on all of 57 biological parameters were collected and divided into two groups, according to the renal biopsy results: the IgA nephropathy (IgAN) group, which was defined as the presence of IgA immune complex deposits predominantly within the mesangial region of the renal glomerulus, and the non-IgA nephropathy (non-IgAN) group, which was defined as a lack of IgA immune complexes or the absence of IgA immune complex deposits predominantly within the mesangial region of the renal glomerulus. The selected 301 cases were divided into either the 'modeling' group (after 2011) or the 'validation' group (before 2011).
Statistic analysis SPSS 17.0 was used for data analysis. Statistical analyses, including t-tests, nonparametric tests (i.e. Mann-Whitney U-test), chi-square test and bivariate correlation tests, were conducted for the selection of different parameters. Logistic regression and discriminant analyses were used in establishing the classification model for IgAN and non-IgAN.
The net reclassification improvement (NRI) was used for evaluating the classification improvement of the biological parameters.
Receiver operating characteristics (ROC) curve analyses were performed on these 57 parameters, and the findings (i.e. area under curve (AUC), 95% confidence interval (CI) and P-value) were presented in Additional file 4. Table 4 contained the C statistics of 16  significantly different serological parameters, among which five parameters, specifically TP, ALB, Ca, FIB, and sIgA, with the additional manifestation were highly significant variables (P < 0.01). sIgA, ALB, and Ca had the top three diagnostic levels (i.e. 75.6, 72.7, and 71.8%) between IgAN and non-IgAN ( Figure 1). Based on the findings of the t-or U-tests and ROC curve analyses, 16 parameters, including manifestation, sIgA, sIgG, D2, TP, ALB, CH, TG, LDL, UN, DB, Ca, ALP, CA199, and CA153, were selected for further analysis.

Correlation analysis of pre-selected parameters
Multiple correlations were found among biological parameters or medical data. However, multiparameter analysis requires that each explanatory variable is independent. Thus, bivariate correlation tests were executed to   eliminate parameters with a high multicollinearity before performing multiparameter analysis. It was found that there were significant correlations (P < 0.01) among almost half of the 16 parameters, specifically among "manifestation", FIB, sIgG, TP, ALB, CH, LDL, and Ca ( Figure 2). Based on our clinical experience, we removed TP, LDL, and Ca, and selected the other 13 parameters for further analysis.

Modeling with multiparameter analysis
Logistic regression and discriminant analyses were used to establish the IgAN and non-IgAN classification model. The 13 pre-selected parameters were manifestation, FIB, D2, sIgA, sIgG, ALB, UN, CH, TG, DB, ALP, CA199, and CA153. a) Model based on logistic regression analysis: Except manifestation, the other 12 pre-selected parameters were substituted into a binary logistic regression as an explanatory variable via the "Enter" method of a univariate analysis ( Table 5). Parameters that had a P < 0.2 in univariate logistic regression were chosen to prevent the exclusion of important variables. With the exception of UN, the other 12 variables had a P < 0.2 and were all substituted into the multivariate logistic regression, using the forward conditional method of entry. The predicted probabilities (PRE-1) were calculated and saved. Using multivariate logistic regression analysis, it was found that only manifestation, FIB, and sIgA were significant predictors of IgAN ( Table 6). The classification model with these 3 parameters was evaluated, and it was found that accuracy was 76.9%, sensitivity was 74.1%, specificity was 79.4%, false positive rate (α) was 20.6%, false negative rate (β) was 25.9%, positive predictive value (PPV) was 76.8%, negative predictive value (NPV) was 76.9%, positive likelihood ratio (+LR) was 3.59, Figure 2 Correlation coefficients between two variables of pre-selected variables. negative likelihood ratio (−LR) was 0.32, and Youden's index was 0.535. The area under the ROC curve with PRE-1 for IgAN was 83.8% (P < 0.0001, 95%CI: 0.766-0.910) (Figure 3).

Validation of the two models
One-hundred and eighty new cases were substituted into the two equations of PRE-1 and PRE-2. Each predicted probability was calculated and compared with the biopsy diagnosis. The sensitivity and specificity were compared between the different cut-off points of predicted probabilities (Table 8). When the cut-off point of the predicted probabilities was decreased to 0.40, the sensitivities of the two models increased, whereas the specificities decreased. When the cut-off point of the predicted probabilities was 0.40, the frequency of misdiagnosis of the two models was higher between 0.26-0.59 than for <0.26 and >0.59 ( Figure 5). This indicates that when we use a mathematical model for predicting a clinical diagnosis, we have to pay close attention to the cases near the cut-off points of the predicted probabilities, as they are prone to misdiagnosis. Further analysis indicated that, when the predicted probability is >0.59 or <0.26, Figure 4 Area under ROC curve of the predicted probability of IgAN with "FIB + sIgA + Manifestation" combination from discriminant analysis. Area under the ROC curve for predicting immunoglobulin A nephropathy (IgAN) with the equation derived via discriminant analysis, which includes the "fibrinogen (FIB) + serum immunoglobulin A level (sIgA) + manifestation" combination. The state variable is IgAN. Predicted probability: 0.26~0.59 Figure 5 Misdiagnosis rates of the two models with different cut-offs for the predicted probability.
the patient has at least an 85.0 or 88.5% probability of having IgAN or non-IgAN, respectively ( Table 9).

Analysis of the net reclassification improvement (NRI)
A logistic regression model and a discriminant analysis model were made as two primary models with the parameters of "gender" and "manifestation". The 12 preselected biological parameters (sIgA, ALB, FIB, CH, TG, ALP, D2, sIgG, DB, CA153, CA199 and UN) were put into the algorithm of the net reclassification improvement (NRI) for assessing the classification power between IgAN and non-IgAN. According to above results, we set the predicted probability into four categories: 0~0.26, 0.26~0.4, 0.4~0.59 and 0.59~1. First, make gender and manifestation into the original parameters of the models. Next, add the other 12 parameters one by one in order of the significance (Table 4) and then check the NRI and P value. The results showed that only sIgA and FIB significantly improved the performance of the models. The NRI of sIgA and FIB was 0.290 and 0.168 (P < 0.005) in the linear logistic regression model, and was 0.308 and 0.169 (P < 0.005) in the linear discriminant analysis model (Table 10). Each step of adding the 12 parameters into the basic models were listed in Additional file 5.

Decision procedure
The decision procedure for the diagnosis of IgA nephropathy in patients with suspected kidney disease, which is based on the validation dataset and the equation from the discriminant analysis, is presented in (Figure 6).

Discussion
When statistics are used to determine the significant predictors for a diagnosis or classification of a disease, different statistical algorithms, biological datasets, and parameters may result in different outputs [18][19][20]. Furthermore, multicollinearity is almost always present with medical laboratory parameters, which may also bring out variability and instability in a statistical model [21]. Thus, choosing appropriate variables for multiparameter analysis is very important. The present study was designed as a cohort study, and was based on a previous retrospective study [15]. Compared with the previous study, this study had more parameters, including fibrinogen, D-dimer, serum IgA, and complement C3, all of which are known biomarkers of kidney diseases [22,23]. Based on univariate analysis, correlation analysis, and clinical experience, 13 out of 57 routine and useful parameters were selected as predictors of IgAN. These were as follows: manifestation, FIB, D2, sIgA, sIgG, UN, ALB, TG, CH, DB, ALP, CA199, and CA153. Three indicators, specifically TP, LDL, and Ca, were screened out, as they demonstrated the highest correlations with the other two indicators (correlation coefficients: TP/ALB = 0.936, LDL/CH = 0.968 and Ca/ ALB = 0.813). Similar results were obtained with two of the most frequently-used multiparameter analyses, in particular logistic regression and discriminant analyses, indicating that these three parameters are truly significant in classifying IgAN and non-IgAN.
Furthermore, 180 new cases were used to validate the two equations derived equations for classifying IgAN. The discerning power of the two classification equations was similar in the validation cases. The different cut-off points of the predicted probabilities resulted in different diagnostic efficiencies, indicating that the cases near the cut-off point require more attention. Further analysis indicated that the misdiagnosis rate of cases with predicted probabilities between 0.26-0.59 was higher than   The net reclassification improvement (NRI), produced by Penica et al., is used for evaluating the classification improvement when a new marker is put into a primary model [24]. For further investigating the classification power of the pre-selected biological parameters, we used "gender" and "manifestation" to create a basic linear logistic regression model and a linear discriminant analysis model. The results of NRI indicated only sIgA and FIB were positive for discriminating IgAN from non-IgAN in this dataset (Table 10).
The exact pathogenesis of IgAN has not been elucidated up to now. Aberrant IgA1 molecular with the glycans (galactose or sialic acid) deficiencies in the hinge region in circulation is deemed generally to be a crucial and initial factor for the development and pathological characteristics of IgAN [25][26][27][28]. The previous reports indicated that abnormally glycosylated IgA1 molecular had more affinity with the specific IgA1 receptor in the mesangial cells [29], was apt to deposit in kidneys combined with circulating IgG molecular or self-assembled macromolecular [30,31], and was hard to clear by liver [32]. Since IgA1 is a predominant isotype of IgA in circulation [33], serum IgA level could reflect serum IgA1 level. Some reports showed that patients with IgAN had elevated serum IgA levels, and consequently, it might be used as a potential diagnostic marker for IgAN [34,35]. Nevertheless, the method by using varying degrees of serum IgA level to make a differential diagnosis for discriminating IgAN from other subtypes of kidney disease is not widely accepted. The present study indicated serum IgA level elevated in patients with both IgAN (331.3 ± 103.9 mg/dl) and non-IgAN (241.5 ± 102.3 mg/dl) according to the reference range 70~180 mg/dl (Table 1). Serum IgA, seemed like not a specific marker for IgAN, still had significant difference and differentially diagnostic value (area under curve of ROC curve: 75.6%, P < 0.0001), which corroborated the views of some previously study [23].
When serum IgA was combined with the other 2 parameters, particularly manifestation and fibrinogen, the diagnostic accuracy of serum IgA increased from 75.6 to 83.9%, as determined by ROC curve analysis, suggesting that, with the exception of serum IgA, clotting mechanisms might be different in the development of IgAN and non-IgAN, which reflected in the proportion of nephrotic syndrome in IgAN (17.2%) and non-IgAN (52.4%). To be precise, serum IgA was a relatively specific marker for IgAN, however fibrinogen and manifestation were two relatively specific markers for non-IgAN. In 63 non-IgAN of the modeling group, 55.6% patients were with membranous nephropathy or minor change disease (Additional file 1). Nephrotic syndrome is the most common clinical manifestation of these two subtypes of glomerular disease [36]. Patients with nephrotic

90.0% probability
Non-IgA nephropathy Figure 6 Decision procedure for the diagnosis of immunoglobulin A (IgA) and non-IgA nephropathy in patients with suspected kidney disease.
syndrome are always in a state of hypercoagulability and hyperfibrinolysis [37,38], which could be caused by the increased synthesis of blood coagulation factors in liver, the increased consumption of antithrombin, and the decreased levels of protein S, protein C and plasminogen [39,40]. Therefore, as Factor I, serum fibrinogen level was higher in non-IgAN characterized by the predominance of nephrotic syndrome than in IgAN, and accordingly had discerning power between the two groups, as well as D-dimer.
Other significantly different biological parameters between IgAN and non-IgAN, such as TP, ALB, CH, TG, LDL and sIgG, were also linked to the different proportion of nephrotic syndrome (Table 2), which is characterized by mass proteinuria, hypoalbuminemia, edema, and varying degrees of hyperlipidemia [36]. Moreover, given that Ca combines with ALB in blood [41], non-IgAN patients that appeared to have nephrotic syndrome demonstrated decreases in serum levels of Ca after a decrease in ALB. This was confirmed by the high correlation coefficient between Ca and ALB (0.813) in our analysis.
Furthermore, though DB was significantly different between IgAN and non-IgAN, the disparity of the averages was little (3.1 ± 1.8 μmol/L vs. 2.4 ± 1.3 μmol/L), and DB levels in most patients were normal. It is reported serum DB correlated with estimated glomerular filtration rate (eGFR) [42], however, we did not find this correlation with eGFR calculated by the CKD-EPI equation [43] (P = 0.35, correlation coefficient = 0.086) in this study. So, we believed the difference of DB between IgAN and non-IgAN had no clinical significance.
We have carried out a study for analyzing the clinical significance of serum CA125 and CA199 levels and their correlation factors in patients with chronic nephropathy, and the results indicated when patients with chronic nephropathy complicated with serous effusions or other factors favoring the formation of serous effusions, such as nephrotic syndrome, serum levels of CA125 and CA199 were apt to increase [44]. And CA153 were also correlated with ALB (correlation coefficient = −0.436, P < 0.0001), CH (correlation coefficient = 0.451, P < 0.0001), nephrotic syndrome (correlation coefficient = 0.418, P < 0.0001), FIB (correlation coefficient = 0.393, P < 0.0001) and LDL (correlation coefficient = 0.440, P < 0.0001). Thus, these two parameters having significant difference between IgAN and non-IgAN could also due to the different proportion of nephrotic syndrome.