Syndrome Differentiation of IgA Nephropathy Based on Clinicopathological Parameters: A Decision Tree Model

Background. IgA nephropathy is the most common cause of primary glomerulonephritis in China, and Traditional Chinese Medicine (TCM) is a vital treatment strategy. However, not all doctors prescribing TCM medicine have adequate knowledge to classify the syndrome accurately. Aim. To explore the feasibility of differentiation of TCM syndrome types among IgA nephropathy patients based on clinicopathological parameters. Materials and Methods. The cross-sectional study enrolled 464 biopsy-proven IgA nephropathy adult patients from 2010 to 2016. The demographic data, clinicopathological features, and TCM syndrome types were collected, and the decision tree models based on classification and regression tree were built to differentiate between the syndrome types. Results. 370 patients of training dataset were 32 years old with serum creatinine of 79 μmol/L, estimated glomerular filtration rate (eGFR) of 97.2 mL/min/1.73 m2, and proteinuria of 1.0 g/day. The scores of Oxford classifications were as follows: M1 = 97.6%, E1 = 14.6%, S1 = 50.0%, and T1 = 52.2%/T2 = 18.4%. The decision trees without or with MEST scores achieved equal precision in training data. However, the tree with MEST scores performed better in validation dataset, especially in classifying the syndrome of qi deficiency of spleen and kidney. Conclusion. A feasible method to deduce TCM syndromes of IgA nephropathy patients by common parameters in routine clinical practice was proposed. The MEST scores helped in the differentiation of TCM syndromes with clinical data.


Introduction
IgA nephropathy has been recognized as the most common form of primary glomerulonephritis [1][2][3][4][5] and is a leading cause of chronic kidney disease (CKD) and end-stage renal disease (ESRD) [6]. In China, IgA nephropathy contributed to 32-54% of primary glomerulonephritis [3,7], and >30% of IgA nephropathy patients would progress to ESRD within 20 years after biopsy [7]. Although recommended by the guidelines, steroids and immunosuppressive agents are not suitable for all patients due to side effects [8], especially for those with their estimated glomerular filtration rate (eGFR) less than 30 mL/min/1.73 m 2 . Similar to several other developing countries, alternative therapies are considered as pivotal and general treatment strategies for IgA nephropathy in China, especially the decoction and the patent medicine of TCM [9][10][11]. In fact, IgA nephropathy patients have benefited from TCM treatment, and the mechanism has been partially unveiled [12][13][14][15].
The experience of 3000 years and modern researches show that Chinese Medicine must be applied with syndrome classification under the direction of TCM theory. Only in this way could the effectiveness be significant and the adverse events be avoided [16]. Syndrome is mainly identified by elements containing rich information of TCM, including medical history, symptoms, and signs, which looks a little like symptom cluster. The elements are commonly collected by observation, listening/smelling, questioning, and pulse analyses. Western medicine and TCM have different exploring dimensions but share the same study objects. In recent years, 2 Evidence-Based Complementary and Alternative Medicine continued in-depth study has got close connection between modern disease and TCM in multiple levels [17], such as the syndromes and clinicopathological parameters in IgA nephropathy [18]. Thus, TCM syndromes have shortcomings in objectivity and consistency and are hard to be spread, especially to Western doctors. Is there any method to support those clinicians with little TCM knowledge in improving the ability of syndrome differentiation? The TCM syndrome types have been reported to demonstrate various clinical and pathological features [19][20][21][22]. The method in the present study described a decision tree as a predictive model, which maps observations to deduce the target value of an element in detail [23,24]. The observations are shown on the branches of the tree and the target is represented in the leaves. During analysis, a decision tree can be used to visually and explicitly account for the process of decision-making, which makes it one of the predictive modeling approaches most commonly used in statistics, data mining, and machine learning. Therefore, we attempted to use the decision tree in order to explore the feasibility of syndrome differentiation among IgA nephropathy patients based on clinicopathological parameters.

Pathological Studies.
The standard process of interpreting renopuncture tissues included light microscopy, immunofluorescence, and electron microscopy. For light microscopy, all specimens were stained with hematoxylin and eosin (H&E), Periodic acid-Schiff, Masson's trichrome, and Jones' methenamine silver. Renal biopsies were scored according to the Oxford MEST scoring system [25] by two pathologists who were blinded to the patients' type of syndromes. An agreement on the definitions and scoring of pathological features was essential during the pathological review.

Syndrome Differentiation.
The evaluation of TCM syndromes was performed according to the Guiding Principle of Clinical Research on New Drugs of Traditional Chinese Medicine [26] for the treatment of chronic nephropathy. The information on the syndromes was acquired from the record of latest ward round by the Chinese medical practitioner. In the training dataset, 273 patients showed qi deficiency of spleen and kidney (QDSK), 66 were with both qi and yin (DBQY), 19 exhibited yang deficiency of spleen and kidney, 8 got yin deficiency of liver and kidney, 3 presented lower energizer damp-heat, and 1 had lung wind-heat. To increase the power of statistics, the first four syndrome types with smallest sample sizes were combined in a category of other types (OTs). In the validation dataset, 80 patients acquired QDSK and 14 acquired DBQY.

Statistical Analysis.
Continuous data with normal distribution were presented as the mean and standard deviation (SD), and those with abnormal distribution were presented as a median and interquartile range. The ranked data with equal intervals were expressed as median and interquartile range, and those with unequal intervals were expressed as counts and percentages of each rank. The categorical variables were expressed as counts and percentages. Comparisons between groups were conducted using independent-samples -test or Kruskal-Wallis test for continuous variables and chi-squared test or Fisher's exact test for categorical variables. All the mentioned tests were two-tailed, and a value < 0.05 was considered statistically significant. The analyses were conducted by SPSS 17.0 statistics software (SPSS, Inc., Chicago, IL). The decision tree was built by R with packages of rpart and rpart.plot based on the data from 390 patients biopsied between 1 January 2010 and 31 December 2014. Variables involved in modeling included gender, age, mean arterial pressure, disease course, macroscopic hematuria or not, eGFR, uric acid, serum albumin, hemoglobin, proteinuria, and urinary red blood cell. The MEST scores were added to the model to explore whether pathological parameters would increase the accuracy of the model. Positive predictive value (PPV), negative predictive value (NPV), integrated discrimination improvement (IDI), and net reclassification index (NRI) were calculated to evaluate the improvement in predicting QDSK of the model with MEST scores. Table 1 lists the demographic and clinical characteristics of biopsy of the training dataset with IgA nephropathy, and the clinicopathological parameters of different TCM syndrome types were compared. The patients were 32 years old (interquartile range, 27-42 years), including 196 (53.0%) females, with a median disease course of 6 months. Even though the infection was the most common (11.1%) inducement, the majority of the patients (78.6) had no obvious inducement. Hypertension was present in 136 (36.8%) patients at baseline. Microscopic hematuria was found in almost all patients, but only 66 (17.8%) patients showed macroscopic hematuria. The overall level of proteinuria was 1 (0.5-2.2) g/day; serum albumin was 40.3 (35.8-43.8) (g/L). The serum creatinine at the time of biopsy was 79 (59-108) mol/L, and eGFR was 97.2 (67.2-120.5) mL/min/ 1.73 m 2 . The scores of Oxford classification were as follows: M1 = 97.6%, E1 = 14.6%, S1 = 50.0%, and T1 = 52.2%/T2 = 18.4%.  of hypertension than DBQY ( = 0.008). Although patients of QDSK seemed less likely to acquire macroscopic hematuria than OTs ( = 0.015) and some difference was seen in urinary red blood cell ( = 0.015), a pairwise test could not figure out the difference. Moreover, no obvious difference was observed in hemoglobin, proteinuria, uric acid, serum albumin, serum IgA, and C3 among those three types. For the renal function, the serum creatinine of patients with DBQY was <QDSK ( = 0.019) and OTs ( < 0.001). The eGFR of those with DBQY was > QDSK ( = 0.014) and OTs ( < 0.001), and the eGFR of OTs was <QDSK ( = 0.012). When dividing the patients' eGFR to different levels according to the 2012 Kidney Disease: Improving Global Outcomes (KDIGO) Clinical Practice Guideline for the Evaluation and Management of CKD [27], the levels of eGFR of OTs were lower than those with DBQY ( < 0.001) and QDSK ( = 0.005), primarily because the percentage of patients with an eGFR lower than 60 mL/min/1.73 m 2 in OTs (41.9%) was significantly larger than that of those with DBQY (7.6%) and QDSK (20.5%). With respect to the pathological features, the three syndrome types seemed to have some differences in M scores ( = 0.032); however, the pairwise analysis did not show a sufficient difference, and a statistical difference was not observed in E, S, and T scores and the ratio of crescents.

Decision Trees.
The decision tree without MEST scores for TCM syndromes is shown in Figure 1. The patients were classified according to the values of the variables at each node. Patients would fall into the nodes on the left if they fit the judgment in rhombuses. And patients would fall into the nodes on the right if they did not fit the judgment. The predicted syndrome types would be shown in the final nodes. For patients with a urinary red blood cell < 44/ L at biopsy, the TCM fundamental syndrome was more likely to be QDSK; nevertheless, OTs would be more possible in those combining a MAP not less than 123 mmHg. For patients with urinary red blood cell not less than 108/ L, the TCM syndrome was more likely to be QDSK. For patients with a urinary red blood cell between 44/ L and 107/ L, additional information of age, eGFR, MAP, and proteinuria was required for the classification of the syndromes. Table 2 and Figure 1 summarized the predicting syndromes by the decision tree grouped by different recorded syndromes. The precision of QDSK was 93.4%, that of DBQY was 57.6%, and that of OTs was 22.6%. The decision tree model without MEST scores achieved total precision of 77.6%. We also attempted to build a decision model with MEST scores (Figure 2). The tree was quite similar to the model described above but the condition of the judgment for patients below 46 years of age with a urinary red blood cell between 44/ L and 107/ L, S score of MEST, eGFR, and disease course was essential for decision-making. The predicting result of the model with MEST is presented in Table 3 and Figure 2. The precision of QDSK was 96.0%, that of DBQY    Table 4. The systolic blood pressure ( < 0.001), diastolic blood pressure ( = 0.001), and MAP ( < 0.001) of the validation dataset were larger compared to the training dataset ( Table 5). The higher uric acid ( < 0.001) and serum creatinine ( < 0.001) and the lower eGFR ( < 0.001) implied that kidney injury was more severe in the validation dataset. However, the M score ( < 0.001) and T score ( < 0.001) were smaller in the validation dataset. We analyzed the eGFR of each T score ( Table 6) and found that it declined from T0 and T1 to T2 in both training and validation datasets. In other words, the larger T score corresponded to less eGFR in both datasets. In the model without MEST (Table 7), 70 (87.5%) patients were accurately predicted as QDSK, leading to total precision of 75.5%. The total precision of the model with MEST was 80.9%, as 75 patients were accurately predicted as QDSK (Table 8). Because the precision of predicting DBQY was identical, additional indexes were evaluated to determine whether MEST scores supported the prediction of QDSK ( Table 9). The PPV and NPV became larger after adding MEST scores to the clinical data. The values of IDI and NRI were > 0. Thus, adding MEST scores to the clinical data seemed to slightly improve the precision in predicting syndrome types.

Discussion
TCM syndromes of kidney diseases indeed have shortcomings in objectivity and consistency, leading to the difficulty in repeating syndrome classifications. But, in recent years, with the gradual deepening of TCM researches in the field of kidney diseases, some exact relationship between TCM syndromes and clinicopathological parameters was subsequently found. With appropriate TCM treatment based on syndrome differentiation, patients with kidney diseases can have their general condition and clinical indicators improved. Among the series of kidney diseases, IgA nephropathy is the most deeply explored one [28,29]. A kidney disease research team led by academician explored the correlation between TCM syndromes and clinicopathological parameters among 1016 IgA nephropathy patients. They found that TCM syndromes were closely related to prognostic factors, like proteinuria,

Deficiency of both qi and yin ( = 14)
Qi deficiency of spleen and kidney 70 (87.5) 13 (92.9) Deficiency of both qi and yin 4 (5.0) 1 (7.1) Other types 6 (7.5) 0 Note: the accuracy of the model was 75.5%.  hypertension, and renal tissue injury [30,31]. Under the guidance of the above studies, a randomized controlled trial was conducted, which found that treatments based on syndrome differentiation can decrease urine protein and serum creatinine without obvious adverse events [32,33]. Soon after that, Chinese Association of Integrative Medicine issued the guideline to spread the evidence of treating IgA nephropathy based on syndrome differentiation. However, the diagnosis of a syndrome is largely dependent on the clinician's educational background and experience, and not all the renal physicians can master the skill in a short time. As the syndromes have such a close correlation with clinicopathological parameters, it is feasible to assist clinicians to classify TCM syndromes with clinicopathological parameters and help apply TCM treatments. The model could promote the precision of syndrome differentiation and make the syndromes more objective and repeatable. The promoted syndrome differentiation could benefit the consistency of the therapeutic effect of TCM as well as the performing of a large-scale prospective RCT in the future aiming to further verify the effect of TCM on IgA nephropathy. All these works will contribute to the spreading and development of Chinese Medicine.
In the present study, demographic data, clinicopathological parameters, and TCM fundamental syndromes of IgA nephropathy patients were retrospectively collected. The syndromes could be classified by decision tree models using five or six parameters. For those patients without biopsy, five variables were needed to classify the syndromes, including urinary red blood cell, MAP, age, eGFR, and proteinuria. For the biopsied patients, additional details were supplemented through the renal tissue. Then, the syndromes could be differentiated by urinary red blood cell, MAP, age, disease course, eGFR, and S score. Although the models with or without pathological features achieved identical precision in the training data, the model with MEST scores showed an advantage in validation, especially in predicting the syndrome types of QDSK.
When we compared the baseline clinicopathological features of the three syndrome types in training dataset, OTs performed poorly in multiple clinical indexes. OTs showed a higher proportion of hypertension than DBQY and were more likely to obtain macroscopic hematuria. In addition, relevant variables of renal function showed an extremely severe kidney injury in patients with OTs. As the OTs encompassed four syndrome types with small sample sizes, it was difficult to decipher the responsible factor; however, 27 (87.1%) patients had yin deficiency of liver and kidney or yang deficiency of spleen and kidney. A cross-sectional study surveyed 1148 Chinese IgA nephropathy patients' clinicopathological features [34]. The study found out that the renal function of those with yin deficiency of liver and kidney or yang deficiency of spleen and kidney was worse than those with DBQY or qi deficiency Evidence-Based Complementary and Alternative Medicine 9 of lung and kidney. Moreover, the MAP of patients with yin deficiency of liver and kidney or yang deficiency of spleen and kidney was higher than that of those with DBQY or qi deficiency of lung and kidney. Therefore, the inferior value of OTs in multiple clinical indexes may be attributed to the types of yin deficiency of liver and kidney or yang deficiency of spleen and kidney.
To our knowledge, this is the first report differentiating TCM syndromes by building models with objective indexes in IgA nephropathy. In other diseases, some model has been developed to diagnose patients' syndromes [35][36][37][38][39][40]. Some of these studies were based on syndrome factors, which comprised subjective indexes, encountering challenges of extrapolation and repeatability [36,40]. Cluster analysis can group the similar clinical features together, which may be related to different syndrome types; however, the similar features did not indicate cooccurrence of clinical features or symptoms [39]. The questionnaire is good for quantifying the variables, saving time and money, and enlarging the sample size rapidly [37]. Nevertheless, it is difficult to design the questionnaire so as to target reliability and validity among large crowds. The methods of classifying the TCM syndromes with objective variables were demonstrated by decision tree models using clinicopathological parameters of 370 IgA patients and then validating the models by 94 patients. The package of rpart is based on the method of Classification and Regression Trees [41] and classification or regression models of a very general structure using a two-stage procedure [42]. The resulting models of rpart have been represented as binary trees. Judgments of yes or no were the only requirement in the process of decision-making, which simplifies the interpretation. The decision-making processes are entirely visible, thus making it convenient to conduct validation, promotion, and application. The nodes of the trees are determined by the values of the variables, and the values of the cut-off points are based on the statistical probability. The variables that do not contribute to decision-making will be ignored. Unlike random forest, this method is less affected by partially missing data. Thus, the decision tree is sufficiently simple for use in clinical practice.
The classification and regression tree has several advantages of its own, irrespective of the management decision and statistical method. One obvious advantage is that the logic and operation are easily understood, making it intuitive and clear. The advantage is prominent when compared to the method with an opaque mechanism of weight and prediction, such as neural networks. Another advantage is the way of handling the missing data that can make full use of the data. The most common approach to handling missing values is deleting missing observations, but classification and regression tree only deletes the observations with missed dependent variables, while the observations with missed independent variables would be reserved. This method also exhibits adequate stability and versatility, because the dividing basis of variables is dependent on the rank of value instead of the numeric size, and thus even some outliers appearing in the dataset would have little influence on the result. The result is less affected by choice of variables than multiple regression analysis, as the addition and elimination of variables would not affect the result unless the variables were involved in the tree. Compared to the simple regressions, the classification and regression tree also eliminates the trouble in selecting variables, because, during the process of constructing the tree, the optimal cut-off point of each involved variable will be automatically selected.
Several limitations should be considered while interpreting our study. This is a retrospective study; thus, the types of TCM syndromes were not strictly classified by identical criteria, and necessary postreclassification was performed. Except for the combined type of OTs, only 2 syndromes, QDSK and DBQY, entered into the statistical analysis, making the generalization of the findings to other TCM syndromes uncertain. The precision of classifying OTs was much less than 1/3, which might be due to the type mixed with several types, and the characteristic of each original type was confused. The clinicopathological parameters of training and validation datasets were quite different, especially in renal function and pathological features. The limited sample size limitedly represented the population of all IgA nephropathy patients.
In conclusion, the decision trees on the basis of clinical indexes can help classify the types of TCM syndromes, and the pathological features may slightly improve the precision. Similar studies based on larger sample size or prospective studies would be useful for further exploration.

Ethical Approval
The study was approved by the Ethics Committee of Guangdong Provincial Hospital of Chinese Medicine.