Correlation between Neutrophil-to-Lymphocyte Ratio and Diabetic Neuropathy in Chinese Adults with Type 2 Diabetes Mellitus Using Machine Learning Methods

Objective One of the most frequent consequences of diabetes mellitus has been identified as diabetic peripheral neuropathy (DPN), and numerous inflammatory disorders, including diabetes, have been documented to be reflected by the neutrophil-to-lymphocyte ratio (NLR). This study aimed to explore the correlation between peripheral blood NLR and DPN, and to evaluate whether NLR could be utilized as a novel marker for early diagnosis of DPN among those with type 2 Diabetes Mellitus (T2DM). Methods We reviewed the medical records of 1154 diabetic patients treated at Tongji Hospital Affiliated to Tongji University from January 2022 to March 2023. These patients did not have evidence of acute infections, chronic inflammatory status within the past three months. The information included the clinical, laboratory, and demographic characteristics of the patient. Finally, a total of 442 T2DM individuals with reliable, complete, and accessible medical records were recruited, including 216 T2DM patients without complications (DM group) and 226 T2DM patients with complications of DPN (DPN group). One-way ANOVA and multivariate logistic regression were applied to analyze data from the two groups, including peripheral blood NLR values and other biomedical indices. The cohort was divided in a 7 : 3 ratio into training and internal validation datasets following feature selection and data balancing. Based on machine learning, training was conducted using extreme gradient boosting (XGBoost) and support vector machine (SVM) methods. K-fold cross-validation was applied for model assessment, and accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC) were used to validate the models' discrimination and clinical applicability. Using Shapley Additive Explanations (SHAP), the top-performing model was interpreted. Results The values of 24-hour urine volume (24H UV), lower limb arterial plaque thickness (LLAB thickness), carotid plaque thickness (CP thickness), D-dimer and onset time were significantly higher in the DPN group compared to the DM group, whereas the values of urine creatinine (UCr), total cholesterol (TC), low-density lipoprotein (LDL), alpha-fetoprotein (AFP), fasting c-peptide (FCP), and nerve conduction velocity and wave magnitude of motor and sensory nerve shown in electromyogram (EMG) were considerably lower than those in the DM group (P < 0.05, respectively). NLR values were significantly higher in the DPN group compared to the DM group (2.60 ± 4.82 versus 1.85 ± 0.98, P < 0.05). Multivariate logistic regression analysis revealed that NLR (P = 0.008, C = 0.003) was a risk factor for DPN. The multivariate logistic regression model scores were 0.6241 for accuracy, 0.6111 for precision, 0.6667 for recall, 0.6377 for F1, and 0.6379 for AUC. Machine learning methods, XGBoost and SVM, built prediction models, showing that NLR can predict the onset of DPN. XGBoost achieved an accuracy of 0.6541, a precision of 0.6316, a recall of 0.7273, a F1 value of 0.6761, and an AUC value of 0.690. SVM scored an accuracy of 0.5789, a precision of 0.5610, a recall of 0.6970, an F1 value of 0.6216, and an AUC value of 0.6170. Conclusions Our findings demonstrated that NLR is highly correlated with DPN and is an independent risk factor for DPN. NLR might be a novel indicator for the early diagnosis of DPN. XGBoost and SVM models have great predictive performance and could be reliable tools for the early prediction of DPN in T2DM patients. This trial is registered with ChiCTR2400087019.


Introduction
Diabetes mellitus is a signifcant global public health issue.It afects nearly 500 million adults globally, and its prevalence is sharply rising [1].It is a chronic metabolic condition characterized by poor homeostasis of glucose control.Te main three types of diabetes are as follows: Type 1 diabetes (T1DM) results from autoimmune damage to the pancreas' insulin-secreting beta cells, while type 2 diabetes (T2DM) is caused by long-term insulin resistance induced by lifestyle factors, and gestational diabetes mellitus occurs during pregnancy [1].Diabetic peripheral neuropathy (DPN) represents a signifcant secondary complication.It can manifest in both type 1 and type 2 diabetes, typically presenting as symmetric distal polyneuropathy, predominantly afecting the lower limbs but also potentially impacting the upper limbs and resulting in sensory loss.Chronic hyperglycemia leads to metabolic and microvascular alterations, resulting in DPN.DPN causes burning, tingling, chilly, or electric shock-like pain in 50% of instances, often worse at night [2].Pregabalin, gabapentin, and amitriptyline are frstline antineuropathic drugs but only partially relieve symptoms [3].Furthermore, it does not address the underlying pathophysiological mechanisms [2].Te etiology of DPN is complex and not fully understood [4].Known mechanisms include hyperglycemia, leading to nerve damage and bioenergy depletion.Hyperglycemia can activate multiple cellular pathways, including the polyol pathway, hexosamine pathway, PKC pathway, and the accumulation of advanced glycosylation end products, which can activate the infammatory response and damage the cell membrane and organelles [4][5][6].Strong experimental and clinical data indicate that immune system activation contributes to both painful and painless forms of DPN [7].Tese processes include the infltration of peripheral macrophages and lymphocytes [8,9], activation of microglia [10,11], activation of the kynurenine pathway [12], and proinfammatory cytokine signaling [12][13][14][15].A recently discovered infammatory biomarker, the neutrophil-to-lymphocyte ratio (NLR), integrates leukocyte differentials into a single variable, providing a more accurate predictive value than each parameter alone [16].Tis biomarker combines two aspects of the immune system: the innate immunological response, mainly due to neutrophils, and adaptive immunity, bolstered by lymphocytes [17].It has been found to be closely associated with sespis [17], pneumonia [17], malignancy [18,19], arterial diseases [20,21], diabetic nephropathy, diabetic retinopathy, and diabetic microvascular complications.Te NLR is also related to DPN, though further exploration is needed [22,23].Given the circumstances, this study evaluated the NLR diference between diabetic patients with and without neuropathy to better understand their relationship.
Recently, machine learning (ML) has garnered attention and acceptance among physicians due to advancements in statistical theory and computer technology [24].Te healthcare feld has been transformed by ML, a branch of artifcial intelligence, owing to its quick, accurate, precise, and reasonably priced computational conclusions [25].ML is crucial for predicting various prevalent diseases, including kidney disease [24], T2DM [26], and cardiovascular disease in diabetic patients [27].However, there are limited reports on diabetic neuropathy and machine learning.Additionally, our study aimed to establish and validate predictive models for DPN using extreme gradient boosting (XGBoost) and support vector machine (SVM) machine learning algorithms.Enrolled patients exhibited poor blood glucose control, were managed with either oral hypoglycemic agents or insulin, and were free from recent infections.Te medical information comprised 64 indicators, encompassing the patients' baseline features, blood routine examination, coagulation activity, blood biochemistry, urinary biochemistry, insulin measurement, tumor screening, ultrasound image results, and EMG results.Te EMG results were used to categorize the participants into those with and without DPN.Figure 1 shows the entire research process.

Gathering of Data.
Patient records are screened using the corresponding criteria for inclusion and exclusion.

Inclusion Criteria.
Diabetes was diagnosed based on the consultation guidelines of the World Health Organization as follows: fasting blood sugar (FPG) ≥ 7.0 mmol/L [126 mg/dL] and/or a 2-hour post-glucose measurement ≥ 11.1 mmol/L [200 mg/dL] [28].After ruling out other potential causes, DPN was defned as symptoms and/or indications of nervous damage in diabetic individuals [29].Furthermore, neuropathy was confrmed by the EMG report.

Exclusion Criteria.
Te exclusion criteria were as follows: (1) patients who left the endocrinology and metabolism department within 48 hours, (2) those aged less than 18 years or more than 89 years, (3) patients with more than 30% missing personal data at admission, and (4) smokers.

Research Technology.
To minimize bias due to missing data, variables with more than 30% missing value were excluded from the fnal cohort, and other variables were imputed using the K-Nearest Neighbors (KNN) method.SPSS 26.0 software was applied to analyze the data.Normally distributed measurement data were described using the mean ± standard deviation, while non-normally distributed quantitative data were described using the median and interquartile range.One-2 International Journal of Endocrinology way ANOVA and nonparametric tests were used to compare data variability between the two groups.Multivariate logistic regression was applied to analyze the relationship between diferent NLR levels and the occurrence of DPN.Feature selection, data preparation, balancing, modeling, and assessment were carried out using Python.Te dataset was randomly split into a training set and a validation set in a ratio of approximately 7 : 3. Te training set was used to build the predictive model, and the validation set was used to verify and evaluate its performance.Two machine learning algorithms, SVM and XGBoost, were used with Python to predict the relationship between related factors and the onset of diabetic neuropathy.After using XGBoost and SVM to build a predictive DPN pathogenesis model, we aimed to understand which features had the greatest impact on the prediction results by screening the importance of features.Permutation Importance was chosen to identify the most important features in the model, and the signifcance of these variables was discussed.Te indicators' accuracy, precision, recall, F1-score, confusion matrix, and area under the receiver operating characteristic curve (AUC) during 5-fold crossvalidation were used to evaluate the model's performance.

Ethical Considerations.
Written informed consent was obtained from each participant before the trial, and the research program was approved by the ethics committee of Tongji Hospital Afliated to Tongji University, in compliance with the Declaration of Helsinki.Te study has been registered with the China Experimental Registry and the registry number is ChiCTR2400087019.

Model Establishment and Evaluation
. NLR related DPN prediction models were developed utilizing XGBoost and SVM.Te optimal parameter modeling was determined through grid search and cross-validation methods, followed by screening the importance of features using Permutation Importance.
Additionally, the associated confusion matrix was presented in Figure 3. Te metrics of the confusion matrix are denoted by true positive, true negative, false positive, and false negative.0 represents diabetic patients, 1 represents diabetic neuropathy patients.XGBoost achieved an accuracy of 0.6541, a precision of 0.6316, a recall of 0.7273, an F1 value of 0.6761, and an AUC value of 0.6900.SVM achieved an accuracy of 0.5789, a precision of 0.5610, a recall of 0.6970, an F1 value of 0.6216, and an AUC value of 0.6170 (Figure 4), which indicated that both XGBoost and SVM had great performance in predicting the relationship between NLR and diabetic neuropathy.

Te Importance of Characteristic Variables. Based on
Permutation Importance, we found that in the XGBoost machine learning approach, UCr, a measure of renal function, was the feature with the highest mean score (1 st ).Tis was followed by disease duration (2 nd ).D-dimer (3 rd ) is an indicator of the presence of hypercoagulability and secondary hyperfbrinolysis.CA199 (4 th ) is a common marker of gastrointestinal tumors, and can also be used to monitor the therapeutic efect and recurrence of malignant tumors.Albumin (5 th ) is the most important protein in human plasma, maintaining the body's nutrition and osmotic pressure (Figure 5).

International Journal of Endocrinology
In SVM machine learning methods (Figure 6), the most important feature is FCP (1 st ), which is secreted by islet beta cells and shares a precursor with insulin.NLR (2 nd ) is a combination of the two main components of the chronic infammatory state (neutrophils and lymphocytes).Neutrophil levels are a marker of nonspecifc infammatory processes, and lymphocyte count indicates the function of immune regulation [30].Patients with DPN often have high neutrophils and low lymphocytes, and the NLR ratio is elevated, representing the ongoing nonspecifc infammatory state in the body and the relative defciency in immune function.Additionally, NLR is more stable and less susceptible to interference by related factors than other leukocyte measures, including neutrophil, lymphocyte, and leukocyte counts.On the other hand, NLR has been shown to be an independent risk factor for a pathophysiological process associated with DPN called diabetic microangiopathy [28,29,31], which afects neurons and Schwann cells, causing neurodegeneration and leading to diabetic peripheral neuropathy.TC (3 rd ) and LDL (4 th ) are indicators of lipid metabolism, while HbA1C (5 th ) efectively refects the average blood glucose level over the past 8 to 12 weeks.

Discussion
One of the prominent issues associated with diabetes is DPN. is characterized by an insidious onset, slow progression, and initial symmetric tingling and numbness, which can progress to foot ulceration and gangrene [32].Terefore, early diagnosis and prevention of DPN are crucial for improving the quality of life for diabetic patients.
It has been established that T2DM and its associated challenges are linked with infammation and immunological dysfunctions [4,33,34].In diabetes, chronic infammation has been implicated in the development and progression of DPN [35,36].Hyperglycemia and oxidative stress are examples of stressors that might cause the production of NF-kB [36].NF-kB activation stimulates the infammatory response by increasing the expression of proinfammatory chemokines such as C-C motif ligand 2 (CCL2), C-X-C motif chemokine ligand 1 (CXCL1), tumor necrosis factor (TNF), and interleukins (IL-1, IL-2, IL-6, and IL-8) [5,35,36].Te production of IL-1 disrupts insulin signaling and leads to the degradation of insulin receptor substrate-1 (IRS-1) by neutrophils, which has been shown to contribute to insulin resistance (IR) [6].Te well-known chemotactic properties of neutrophils may exacerbate infammation and insulin resistance in T2DM by attracting other immune cells to adipose tissue [6].Additionally, lymphopenia may be associated with T2DM and its consequences.Similar lymphopenia has been observed in various clinical and experimental studies involving individuals with microvascular, macrovascular, and other complications [37][38][39][40].Elevated oxidative DNA damage and the death of lymphocytes in blood vessels might contribute to this condition.
Within the present research, lymphocyte count was signifcantly lower in the DPN group, while neutrophil count was signifcantly higher.When comparing the NLR values of the DPN group to those of the DM group, they were  In machine learning models for predicting the onset of diabetic neuropathy, feature importance analysis of XGBoost and SVM showed that the top fve variables of XGBoost were UCr, disease duration, D-dimer, CA199, and albumin.Te top fve variables for SVM were FCP, NLR, TC, LDL-C, and HbA1C.Due to diferences in model prediction performance and algorithm, NLR is not always the most crucial variable.Combining the results of the two models, we conclude that controlling infammation is important in DPN patients.UCr can be used for early diagnosis of diabetic neuropathy and kidney disease [41].In addition, the duration of diabetes and glycosylated hemoglobin levels afect the progression of diabetic neuropathy.Te longer the duration of diabetes, the higher the glycosylated hemoglobin level, and the greater the risk of complications from diabetic neuropathy [42].Adequate nutrition is essential for tissue remodeling, and low serum albumin is a marker of malnutrition [43].In recent years, lipid abnormalities and DPN have received increasing attention [4].Abnormal lipid metabolism leads to atherosclerosis, and NLR is also an independent predictor of the presence of carotid plaques [44].Tis should be further studied in future research.Tis study also found that FCP, D-dimer, and CA199 are associated with DPN and can predict its development.Fasting insulin and fasting C-peptide represent islet function, with fasting C-peptide being more representative.Higher fasting C-peptide levels were inversely correlated with DPN in this research, which aligns with a prior study that found a negative correlation between C-peptide levels and cardiovascular autonomic neuropathy in T2DM patients [45].A previous study in Korea concluded that the risk of diabetic neuropathy was related to the lower fasting serum C-peptide quartile after adjusting for multiple confounding factors [46].However, a Danish study found a correlation between DPN and C-peptide levels ≥1550 pmol/L [47].Scholars have also reported that C-peptide improved neuropathy in type 1 diabetic BB/Wor-rats.More research is necessary since type 1 and type 2 diabetes difer in the way C-peptide afects DPN [48].
In summary, combining multivariate logistic regression analysis and machine learning models, our results indicate that predicting DPN occurrence involves many factors, including UCr, duration, D-dimer, CA199, albumin, FCP, NLR, TC, LDL-C, and HbA1C.Our research shows that NLR is not only related to the occurrence of DPN but can also be used as a predictor to monitor the early occurrence of this disease.
According to prior research, the Diabetic Neuropathy Symptom Score (DNS) is an efective screening tool for diabetic neuropathy, and the Toronto Clinical Scoring System (CSS) can identify the presence and severity of diabetic peripheral sensory-motor polyneuropathy (DSP) [49].However, these methods are easily infuenced by patients' subjective perceptions, ignore the screening of asymptomatic neuropathy, and require more time, limiting their use in clinical DPN screening.In contrast, NLR can be easily computed using the neutrophil-to-lymphocyte ratio in peripheral blood, which is characterized by excellent stability, high repeatability, and low cost [50].Terefore, NLR, as an early diagnostic indicator for DPN, with the potential for early identifcation of asymptomatic DPN patients and signifcant improvement in patient prognosis, holds substantial clinical signifcance.
Te study has some limitations.First, no stratifed research has been conducted on the association between NLR and DPN.Terefore, further studies are needed where individuals are categorized into diferent groups based on the severity of DPN.Second, our sample size is small and limited by region and ethnicity, which may lead to biases in these statistical results.Tus, multicenter studies are required to assess the use of NLR for DPN prediction in more detail.

Figure 1 :
Figure 1: Flowchart showing the patients included in the study.

Figure 2 :Figure 3 :
Figure 2: (a) ROC research for NLR to forecast diabetic peripheral neuropathy (coverage underneath curves � 0.638) and (b) average NLR results in diabetic neuropathy group and diabetic group.DM � diabetes, DPN � diabetic peripheral neuropathy, NLR � neutrophil-tolymphocyte ratio."∘" indicates that the individual values are more than 1.5 to 3 times the interquartile spacing (box height) from the bottom line of the box chart." * " indicates that the individual value is more than 3 times the box height from the top line of the box diagram.

Figure 4 :
Figure 4: ROC curves for predicting relationship between diabetic neuropathy and the development of NLR with machine learning algorithms.(a) Extreme gradient boosting (XGBoost) and (b) support vector machine (SVM).

Figure 5 :Figure 6 :
Figure 5: Feature signifcance ranking of the incorporated feature of the XGBoost model.

Table 1 :
Cluster features and test results.
† Signifcant diference between the two groups.

Table 2 :
Shows the results of logistic regression analysis of DPN.