Features characterising cardiac autonomic neuropathy in diabetes using ensembled classification

OBJECTIVE
Using supervised machine learning to classify the severity of cardiovascular autonomic neuropathy (CAN). The aims were 1) to investigate which features contribute to characterising CAN 2) to generate an ensembled set of features that best describes the variation in CAN classification.


METHODS
Eighty-two features from demographic, beat-to-beat, biochemical, and inflammation were obtained from 204 people with diabetes and used in three machine-learning-classifiers, these are: support vector machine, decision tree, and random forest. All data were ensembled using a weighted mean of the features from each classifier.


RESULTS
The 10 most important features derived from the domains: Beat-to-beat, inflammation markers, disease-duration, and age.


CONCLUSIONS
Beat-to-beat measures associate with CAN as diagnosis is mainly based on cardiac reflex responses, disease-duration and age are also related to CAN development throughout disease progression. The inflammation markers may reflect the underlying disease process, and therefore, new treatment modalities targeting systemic low-grade inflammation should potentially be tested to prevent the development of CAN.


SIGNIFICANCE
Cardiac reflex responses should be monitored closely to diagnose and classify severity levels of CAN accurately. Standard clinical biochemical analytes, such as glycaemic level, lipidic level, or kidney function were not included in the ten most important features. Beat-to-beat measures accounted for approximately 60% of the features in the ensembled data.


Introduction
It is estimated that diabetes affects more than 535 million people globally, with projections suggesting that upwards of 783 million people will be affected in 2045 (Sun et al., 2022).As a result, clinical relevant micro-and macro-vascular complications are also expected to increase (Sinnreich et al., 2005).Among the lesser understood microvascular complications, is diabetic autonomic neuropathy, which among others affects the cardiovascular systems (Vinik et al., 2013).Though autonomic neuropathy of the cardiovascular system (CAN) increases the risk of silent myocardial ischemia and mortality (Feldman et al., 2019;Pop-Busui, 2012), it is still not clearly understood why some develop this, while others do not.
Precision diagnostics tailor clinical diagnosis to the individual by use of biomedical information coupled with machine learning.This approach can facilitate earlier diagnosis and individual risk (Chung et al., 2020).Clinically, this has been used to gain insights into diabetic sensorimotor polyneuropathy (Haque et al., 2022), general complications (Ljubic et al., 2020), and recently CAN (Abdalrada et al., 2022).
The use of machine learning allows for the selection of features from large samples of complementary clinical data.To improve generalisability and minimise overfitting to a particular training set, feature selection techniques must be employed.Support vector machines (SVMs) are well-established algorithms for performing classification on both linear and non-linear data, often in clinical settings (Kumari and Chitra, 2013;Othman, 2011).A common issue involving SVMs is their operation as a 'black box' (Cherkassky and Dhar, 2015).For this reason, medical analysis is often performed with hierarchical models (Burgess et al., 2000;Wang and Gatsonis, 2008) to eliminate the uncertainty in the weighting of variables that produce a classification.A common hierarchical model (in this case, a hierarchical clustering model) is a decision tree, which has merited medical applications in the diagnosis of type 2 diabetes (Habibi et al., 2015;Al Jarullah, 2011).This model classifies data by producing a condition on individual features or nodes to produce two sub-nodes that stem from each decision.By following this tree from the topmost node and following the numerical conditions on each node until an end node is met, a prediction can be made for each feature set.In reducing the issue of overfitting with decision trees, the technique of ensemble learning is often applied.Ensembling such trees produce a random forest (Breiman, 2001), which has similarly been applied to the risk prediction of type 2 diabetes (Xu et al., 2017).With the introduction of multiple models, a metric for aggregating the importance's over each model can be utilised to preserve the features that are consistently useful in modelling.Such approaches are referred to as ensembling, much like the intuition of random forests, and have seen both theoretical (Seijo-Pardo et al., 2015) and commercial use (Shah and Peretiatko, 2021).
We hypothesised that the severity of CAN may be classified based on clinical features, heart rate variability measures, and biochemical measures including inflammatory markers using a combination of supervised machine learning techniques and aggregation of the best performing features.
The aims were to 1) investigate which clinical and paraclinical features can help explain the characteristics of CAN, and 2) generate an ensembled set of features that best describes the variation in CAN severity.

Study population
This cross-sectional dataset was comprised from two cohorts of people with diabetes recruited from the Department of Endocrinology, Aalborg University Hospital: Cohort 1: Cross-sectional study investigating cardiovascular and autonomic complications in consecutive patients attending the annual health visit with unknown status of diabetic neuropathy (type 1 diabetes, n = 56; Type 2 diabetes, n = 100) and Cohort 2: Baseline data from a randomized controlled trial (RCT) (Brock et al., 2019) in 48 participants with type 1 diabetes and verified peripheral neuropathy according to the Toronto criteria.The studies comprising the current dataset were conducted in accordance with the Declaration of Helsinki, local regulations, and International Conference on Harmonization Good Clinical Practice guidelines.The North Denmark Region Committee on Health Research Ethics, Denmark approved both studies (N-20170045 and N-20130077).Furthermore, Cohort 2 was approved by the Danish Health and Medicines Authority (EudraCT No.: 2013-004375-12).All participants gave their written informed consent before participating in any data recording.The clinical characteristics of the cohorts is available in Table 1.

Clinical measures
Characteristics of the dataset consist of age, sex, disease duration, and smoking habits.Height was measured using a stadiometer (Seca GmbH & Co. KG., Hamburg, Deutschland).Weight was assessed with a standardised calibrated scale (BWB-800AS, Tanita, Arlington Heights, Illinois, USA).Body mass index was calculated based on height and weight.Cardiac measures (Heart rate, resting systolic, and diastolic blood pressures) were obtained in a seated position using a blood pressure monitor (Intellisense Ò , Omron Healthcare, Inc., Bannockburn, Illinois, USA).Additionally, a vibration perception threshold of the first toe was performed on the side of the dominant hand.

Assessment of cardiovascular autonomic neuropathy
All participants underwent cardiovascular autonomic reflex testing in the morning (between 8:00 and 10:00 a.m.) in a consistently lit, quiet laboratory.This was done to minimize variance in circadian rhythm The diagnosis of CAN was performed using either orthostatic hypotension measure or The Vagus TM device (Medicus Engineering ApS, Aarhus, Denmark).Validation of the Vagus TM has been described elsewhere (Ejskjaer et al., 2008;Fleischer et al., 2011).Following 10 minutes of rest a CAN score was calculated based on three recordings using theVagus TM device: electrocardiographic recordings at rest, and during cardiovascular autonomic reflex tests consisting of deep breathing (expiration/inspiration ratio), The valsalva ratio, and postural change (30:15 supine to standing ratio).I To help mitigate missing data if one of the tests were not completed, the CAN score was estimated from the remaining two procedures.The state established CAN (CAN 2) was defined if two or more abnormal reflex tests were registered, or orthostatic hypotension defined as > 20 mmHg reduction in systolic blood pressure on standing; borderline CAN (CAN 1) was selected if one abnormal reflex test was present, and no CAN (CAN 0) was used when tests were within the normal range of the specific age-dependent cut-off values (Spallone et al., 2011).

Heart rate variability
Holter monitoring was performed using either the ePatch electrocardiographic Recorder (BioTelemetry Technology ApS, Hørsholm, Denmark) or Holter monitor (Lifecard CF; Del Mar Reynolds, Spacelabs Healthcare, Snoqualmie, WA, USA).This monitoring was performed for 24 hours.The monitor was mounted in the morning, and heart rate variability parameters were computed for the entire 24 hour recording period.Data was analysed using Cardioscope TM (SmartMedical, Gloucestershire, United Kingdom) for the ePatch, or Pathfinder (Software revision B code; Spacelabs Healthcare, Hertford UK) for the Lifecard CF system.The timederived measures of HRV were: Standard deviation of all normal RR (SDNN), standard deviation of the averages of RR (SDANN), mean standard deviation of the averages of RR for each 5-minute interval (SDNNi), and root mean square of the successive differences (RMSSD).The frequency domain measures of HRV were: Very low frequency (VLF), low frequency (LF), and high frequency (HF).The frequency measures were analysed and interpreted according to (Malik et al., 1996).
Additionally, cardiac vagal tone (CVT), was analysed using the ProBioMetrics online app version 1.0 (ProBioMetrics, Kent, UK).Artifacts in the recordings were defined as a change in two succeeding QRS complexes exceeding 15 beats per minute, files were inspected and cleaned by removing five heartbeats before and after to derive and artefact free CVT.The data was removed from analysis if the number of removed heartbeats exceeded 20%.

Inflammatory markers
On the day of examination blood samples were drawn from the cubital vein of fasting participants.The blood was centrifuged at 2500 rpm for 10 minutes, the serum was stored in a À80 °C freezer.Biomarker concentrations in plasma samples were analysed using the V-PLEX Neuroinflammation Panel 1 Human Kit (Meso Scale Diagnostics Ò [MSD], Gaithersburg, MD, USA) on a MESO QuickPlex SQ 120 instrument (MSD) according to the manufacturer ´s specifications.The analysis was conducted at the CoreLab of Steno Diabetes Center Copenhagen.Serum concentrations of cytokines (interferon-(IFN)-c, tumor necrosis factor (TNF)-a, TNF-b, eotaxin, eotaxin-3, IFN-c-induced protein (IP)-10, interleukin-(IL-) À1a, À1b, À2, À4, À5, À6, À7, À8, À10, À12/-23p40, À13, À15, À16, À17A, macrophage-derived chemokine (MDC), monocyte chemoattractant protein (MCP)-1, MCP-4, macrophage inflammatory protein (MIP)-1a, MIP-1b, CC chemokine ligand 17 (CCL17), and CRP.Sample values below the detection limit of the assay were assigned a value of the detection limit divided by p 2. If more than 30% of the measured samples for any given biomarker were below the detection limit, the biomarker was excluded from the analysis.Likewise, samples with a coefficient of variation > 30% between duplicate measurements were excluded from the analysis (Croghan and Egeghy, 2003).
Table 1 Demographic and variables of the study cohorts, divided over the three cardiovascular autonomic neuropathy (CAN) grades 0, 1, and 2. A one-way ANOVA using CAN as a categorical independent variable to test differences of variables, gender and diabetes type was tested using Pearson's Chi-squared test.For a description of cardiovascular autonomic neuropathy (CAN) see: ''Assessment of cardiovascular autonomic neuropathy" For a description of biochemical measures see: ''Clinical biochemistry" For a description of heart rate variability see: ''Heart rate variability".Body mass index (BMI), alanine aminotransferase (ALAT), low-density lipoprotein (LDL), high-density lipoprotein (HDL), estimated glomerular filtration rate (eGFR), Root mean square of successive differences (rMSSD), standard deviation of the averages of RR (SDANN), standard deviation of the averages of RR (SDNN), mean standard deviation of the averages of RR for each 5-minute interval (SDNNi), very low frequency (VLF), low frequency (LF), high frequency (HF), The inflammation markers are tumour necrosis factor (TNF) and IL-are serum concentrations of cytokines (interleukin).

Pre-processing
Any missing fields were filled with the mean value from the feature of the respective diabetic disease (type 1 or type 2).After preprocessing missing values by averaging the remaining data over diabetes type two datasets are yielded: 1) The full dataset, which comprises all demographic, heart, blood sample, and inflammation data (n = 82) and 2) the reduced dataset, which only consists of the most important features produced by the feature extraction techniques (n = 10), this is visualised in Fig. 1.Each feature selection technique hence generates a separate reduced dataset that is independently analysed.
The use of feature selection in dataset reduction has gained success when the dataset is reduced to a minimum size of ffiffiffiffi N p (Hastie et al., 2009) or log 2 ðN þ 1Þ (Breiman, 2001) for use in random forest classifiers.This size may require an increase to 3 log 2 ðN þ 1Þ with many categorical (discrete) variables.However, with a dimensionality of 82 features, Breiman 2021 (Breiman, 2001) suggests that ffiffiffiffiffiffi 82 p % 9, or log 2 82 þ 1 ð Þ%6 features are required for a suitable model.A total of 10 features are thus offered to capture greater dataset variance and offer more potential for the exploration of the features that may influence CAN Score.

Data analysis
All analysis was performed using the Python3 scikit-learn (Pedregosa et al., 2011) library and all figures were produced using the ''matplotlib" (Hunter, 2007) library.A variety of classification techniques documented below are investigated using the full dataset with a 20% test-train split, with all results (accuracies, features, and confusion matrices) averaged over 1000 iterations.The key features that contribute to each classifier are then documented and aggregated to establish which clinical trials are most important in classifying the CAN scores of a patient.

Classification techniques with embedded methods
These methods incorporate feature selection directly into the training of the model (Lal et al., 2006), such as with regularisation (Zhang et al., 2020), where certain variable coefficients (the 'weight' applied to the feature) are intentionally set to 0 to reduce overfitting.This has the innate effect of eliminating irrelevant features.The following section investigates a set of classifiers with embedded methods applied.

Support vector machines
The SVM model is generated using a randomised grid search with 5-fold cross-validation.This grid search evaluated SVMs with linear, polynomial (degrees 2-5), and radial basis function (RBF) kernels for regularization costs of 1 to 1000 logarithmically.The product of the grid search was an RBF kernel with gamma equal to 1e-4 and a regularization cost of 1.To perform feature extraction, the permutation importance (Altmann et al., 2010) of the best model in fitting the test data to its labels is evaluated.This allows for the extraction of unbiased feature importance with non-linear kernels.

Decision trees and ensemble learning
Performances with both an individual decision tree (to yield an interpretable diagram) and a random forest (for improved classifier accuracy) are assessed.The random forest is implemented with 50 trees with a maximum depth of 8 for minimising overfitting and, like the single decision tree, utilise the mean decrease in impurity to extract the 10 most important features.Importantly, for both the decision tree and random forest classifiers, the comparatively smaller sample size of CAN 1 patients is compensated for by applying class weight balancing, thereby penalising mispredictions of CAN 1 more harshly than CAN 0 or CAN 2. This results in the classifier prioritising the learning of CAN 1 predictions.

Evaluation of classifiers
Two metrics are used for the classifier evaluation: both its test/training accuracy and F1-score.Whilst the former helps with identifying overfitting, the latter is useful for assessing performance with imbalanced classes (Cruz et al., 2016).This is particularly relevant for this dataset, as the proportion of CAN 1 patients is significantly smaller than the other classes.

Feature importance ensembling
For this method, a similar approach to Rajiv Shah et al. (Shah and Peretiatko, 2021) is taken, but instead weighing the scores by model performance, to prioritise features that contribute to a more successful model.
Algorithmically, this is described as follows: 1. Normalise importances for each model (of 10 features) such that they sum to 1 2. Multiply all importances for each model by test performance of model 3. Sum scores for each feature and list in descending order Which results in the pipeline illustrated in Fig. 2.This creates a set of features that is, effectively, a weighted sum of the best features from each model.This resultingly encourages generalisability of results, as these features should be prevalent irrespective of the classification tool used.

Wrapper methods
Wrapper methods perform feature selection by generating increasingly large subsets (recursively adding features) of the original feature space up to a limit until the best combination of features is found.This has seen medical applications using supervised learning with Naïve Bayes amongst other classification algorithms (Karegowda et al., 2010;Singh and Singh, 2021).

Receiver operator characteristic
Following the extraction of the 10 best features across all models through ensembling, the sensitivity and specificity of the best classifier for a binary comparison of CAN classes were derived using receiver operator characteristic (ROC) curves.A brute force, wrapper approach was implemented by exhaustively generating combinations of variables up to 3 features and evaluating the area under the curve (AUC).

Support vector Machine
The training and test accuracies for the support vector machine are for the full dataset: 92.0% training and 55.0% testing with an F1-score of 53.6%.For the reduced dataset, the accuracies were 59.1% for training and 55.4% for testing with an F1-score of 56.4.The 10 best performing features were: rMSSD, disease duration, SDNN, CVT, VEGF-D, VLF, LF, IL-7, height, and Flt-1.These results are also visualised in Fig. 2. By retraining the model on the 10 most significant features, a similar test accuracy (a 0.4% improvement) can be achieved with a large reduction (%33%) in training accuracy.This illustrates that overfitting is significantly reduced, allowing for a more generalisable model.

Decision tree
The training and test accuracies for the decision tree are for the full dataset: 97.0% training and 47.3% testing with an F1-score of 47.1%.For the reduced dataset, the accuracies were 92.7% training and 50.7% testing with an F1-score of 51.8%.The 10 best performing features were: rMSSD, disease duration, HF, VEGF-D SDNN, IL-2, CCL17, age, TSH, and CVT.The four highest decision tree branches are displayed in Fig. 3 and the features are also visualised in Fig. 2. A reduction in training accuracy of %4.3% permits an increase in test accuracy of %3.4%.

Random forest
The training and test accuracies for the random forest are for the full dataset: 58.0% training and 61.6% testing with an F1score of 61.7%.For the reduced dataset, the accuracies were 58.9% training and 59.7% testing with an F1-score of 60.1%.The 10 best performing features were: rMSSD, disease duration, HF, SDNN, CVT, IL-5, VEGF-D, LF, IL-2, and transferrin.These results are also visualised in Fig. 2. The random forest classifier uses the 'out of bag' (OOB) error as a means of quantifying the test and training accuracies.This computes the inaccuracy in predicting a training sample using only the trees that did not have this training sample in their bootstrap.The reduced dataset produced a 0.9% increase in training accuracy with a 1.9% decrease in test accuracy.Due to the similar magnitudes of test and training accuracy, however, it is not the case that this classifier is overfitting.Instead, a relatively high accuracy can be maintained despite the smaller input space.

Feature importance ensembling
Ensembling the features from the three classifiers resulted in the 10 most important features being: rMSSD, disease duration, SDNN, VEGF-D CVT, HF, IL-2, CCL17, LF, and age.Looking at the proportion of total feature importance beat-to-beat data accounted for approximately 60% of the data followed by demographic data and inflammatory markers.Notably, no features from the standard biochemical blood sample dataset were part of the top 10 ensembled features.The 10 most important features were stratified according to the CAN score in Table 2.

Receiver operator characteristic curves
Evaluating the random forest classifier with binarized classes and performing feature selection with an exhaustive search (up to a maximum of n = 3 features) yields the results shown in Supplementary Fig. 1.The extracted features, up to n = 3, the sensitivity and specificity at the optimal discriminator (highest AUC of all feature combinations), as well as AUC.These results are summarised in Table 3. ∑ Fig. 2. Training and testing accuracies of the three classifiers support vector machine, decision tree classifier, and random forest classifier in the full (n = 82) and reduced (n = 10) datasets.The method of ensembling takes the best performing features from each classifier accounting for test accuracies.This results in the 10 most important ensembled features.Root mean square of successive differences (rMSSD), disease duration (DD), high frequency (HF), vascular endothelial growth factor (VEGF)-D, standard deviation of the averages of RR (SDNN), Interleukin (IL)-2, À5, CC chemokine ligand (CCL)17, thyroid stimulating hormone (TSH), cardiac vagal tone (CVT), very low frequency (VLF), low frequency (LF), vascular endothelial growth factor receptor 1 (Flt-1).

Discussion
Early detection of CAN is clinically relevant, as progression may be halted or even reversed through intervention.To classify the severity of CAN, we identified the 10 most important features using the ensembling method: rMSSD, disease duration, SDNN, VEGF-D, CVT, HF, IL-2, CCL17, LF, and age.Some methods have shown that inflammatory markers and heart rate-derived measures associate with neurocardiac function and pose possible screening tools (Wegeberg et al., 2020a(Wegeberg et al., , 2020b;;Wegeberg et al., 2022).The variables rMSSD, SDNN, CVT, HF, and LF are all heart rate variability derived time-and frequency domains.Parasympathetic activity affects primarily the time-derived measure RMSSD along with the variability of the high frequency components of frequency data (0.15-0.4 Hz) (Pop-Busui, 2012).These features have all previously been reported to be diminished by CAN, due to parasympathetic withdrawal causing sympathetic dominance (Wegeberg et al., 2020a;Wegeberg et al., 2020b).Disease duration and age were expected to contribute to CAN development throughout diabetic disease progression.Especially in type 2 diabetes, age and disease duration are less interdependent in comparison to type 1 diabetes.VEGF-D, IL-2, and CCL17 are all markers of inflamma-tion.Increased VEGF-D has been reported to be overexpressed in diabetic atherosclerosis (Wong et al., 2011), which is associated with reduced heart rate variability (Jaiswal et al., 2013).IL-2 has been associated with changes in sympathetic activity, but the subtle magnitude of the marker limits its use as a clinical biomarker (Guinjoan et al., 2009;Tio et al., 2000).CCL17 level is increased with age and correlates with cardiac dysfunction (Zhang et al., 2022).

Evaluation of classifiers
Although many models exist for performing supervised learning on a clinically obtained dataset, the models in this paper are selected for the following reasons: the SVM is a well-established classifier that has seen countless medical implementations and is known to offer high performance.However, due to its functionality as a 'black box,' other methods are preferable, for example if visualising the numerical 'cut-offs' that classify a patient is desired.In this case, a decision tree model is preferred.This model is known to suffer from overfitting, so the use of a random forest model is often used to negate this problem, although this comes at the cost of visualisation.Since all models offer positives and negatives to this task, and all may produce varying features, it is important to pool their responses to produce a set of features that persists across all models (ensembling), to maximise generalisability.

Quantitative analysis
The use of a classifier for feature selection is entirely dependent on its performance.With performance ranging from %51% with decision trees to %60% with ensembled random forests, we accept that this model is insufficient as a clinical tool if an accurate diagnosis needs to be performed.Instead, this model is offered for explorative purposes.Randomly guessing a patient's CAN score would yield an accuracy of 33.3%, and ''strategically" guessing the CAN score based on the majority class (CAN 0) would yield 45.1% accuracy.For this reason, having demonstrated that the best model offers a 15%-point improvement in classification accuracy, these features may contribute to an understanding of the underlying mechanisms resulting in CAN.The training accuracy is reduced in all cases despite maintaining or increasing the test accuracy, this illustrates that the features selected are suitable for generalising the model, thereby reducing overfitting to the dataset.

Table 2
Ten most important features derived from ensembling and stratified by cardiovascular autonomic neuropathy (CAN) Score.The features are ordered in order of importance.Root mean square of successive differences (rMSSD), standard deviation of the averages of RR (SDNN), vascular endothelial growth factor (VEGF)-D, cardiac vagal tone (CVT), high frequency (HF), Interleukin (IL)-2, CC chemokine ligand (CCL) 17, low frequency (LF).Generating the confusion matrices for both the full and reduced datasets for the best performing classifier (random forest) allows for a direct comparison of the inter-class classification accuracies.From Fig. 4, both the full and reduced datasets perform well at classifying CAN 0 and CAN 2, with accuracies of 85.2% and 67.5% for the full dataset and 76.1% and 68.2%, for the reduced dataset.This demonstrates that the full dataset is better when categorising CAN 0, offering a %9% improvement, however, both classifiers offer similar performance with regard to categorising CAN 2.

CAN
Both classifiers struggle with the classification of CAN 1 as the full dataset offers a 7% accuracy, with 66.6% of the predictions incorrectly assigned to CAN 0. This may reflect the clinical challenge in the early detection of CAN 1.Similarly, the reduced dataset has an 18.7% classification accuracy for CAN 1, with 55.8% of these predictions assigned to CAN 0. From these results, the difficulty in distinguishing between CAN 1 and CAN 0 is evident, although the reduced dataset does offer a considerable, threefold improvement of %12%.This suggests that feature selection 'de-noises' the classifier well.

Clinical implications
To our surprise, none of the standard clinical biochemical analytes, such as those describing the glycaemic level, lipidic level, or kidney function were included in the most important features accounting for prediction accuracies in any of the three classifiers, nor in the ensembled model.This may be due to a non-linear development of diabetic microvascular complications developing independently of each other.Furthermore, this may underscore the importance of recognizing systemic inflammatory markers in the investigation of CAN and is supported by other findings that associate interleukin markers with dysfunctional heart rate variability (Wegeberg et al., 2022).The clinical classification of CAN 1 is difficult in both the full and reduced dataset, which underlines the complexity of this subsection of the patient population and emphasizes the silent nature and multifaceted underlying pathophysiology.However, our findings implies that cardiac reflex responses should be monitored closely to diagnose and classify severity levels of CAN accurately.

ROC curve analysis
We used feature selection to find the features that best described the distinction between different CAN stages, but no more than three.Interestingly, different combinations of features appear to provide the best distinction between the different stages, presenting with highly sensitive and specificity scores at the optimal disclination point.As expected, the group with severe CAN (CAN2) were easier to discriminate from no or less severe CAN (CAN 0 and CAN 1), providing higher AUC based on inflammatory markers and age or disease duration, though interestingly, without taking into consideration heart rate variability measures.For distinguishing less severe CAN (CAN 1) from no CAN (CAN 0), which is useful for early detection of cardiovascular autonomic neuropathy, a heart rate variability element was, however, necessary to achieve optimal discrimination.Additionally, age appears to be a prominent factor when distinguishing no CAN from the presence of any type of CAN, suggesting CAN arises later in life.

Limitations
The recruitment in the cohorts were vulnerable to selection bias as in particularity individuals with a surplus of resources, might be overrepresented.The choice to not include diabetes type into the model was done to minimise the risk of overfitting data since a large part of the group of people with CAN 2 have type 1 diabetes.We hypothesise that while the underlying differences between diabetes type one and two are very different the effects of inadequate management of the disease will have similar systemic bodily effect.CAN has previously been shown to be independent of diabetes type (Pop-Busui, 2012).In machine learning, the number of features that can be utilised is closely related to the number of subjects that are being analysed.This is observable in the overfitting in the full dataset parts of the analysis, but mitigated by selecting fewer features resulting in closer resemblance in the testing and training accuracies.A partial explanation for the lower classification accuracies of CAN 1 is the class imbalance with CAN 1 having the lowest number of participants (22.55%) and CAN 0 having the largest (45.1%).

Conclusion
Surprisingly, none of the standard clinical biochemical analytes, describing the glycaemic level, lipidic level, or kidney function were included in the ten most important features associated with CAN.In contrast, the most important features are beat-to-beat measures accounting for approximately 60% of the feature importance of the ensembled data.Unsurprisingly, the extracted beat-tobeat measures are primarily suggestive of reduced parasympathetic activity.Furthermore, disease duration and age are expected to correlate with CAN development, since these increase with disease progression.Most importantly, our models indicate a higher impact of markers reflecting systemic inflammation.Thus, new treatment modalities targeting systemic low-grade inflammation could potentially serve as a therapeutical target in the attempt to halt the development and progression of CAN.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig.1.Pre-Processing pipeline and number of features from each subdomain of the data.

Fig. 3 .
Fig.3.Structure of an example decision tree.Taking the left path from each node evaluates its rule as true and false otherwise.Darker coloured boxes for each colour correspond to the certainty of prediction.The grey boxes at the bottom of the tree denote truncation, as showing the maximum depth of the tree would hinder readability.Root mean square of successive differences (rMSSD), vascular endothelial growth factor (VEGF)-D, low frequency (LF), standard deviation of the averages of RR (SDNN), disease duration (DD), Interleukin (IL)-2, high frequency (HF), cardiac vagal tone (CVT).

Fig. 4 .
Fig. 4. Class prediction accuracies (confusion) for A) full (n = 90) and B) reduced (n = 10) dataset based on demographics, blood data, heart data, and inflammation data, and averaged across 1000 random forest classifiers.Each confusion matrix shows the true and predicted accuracy across the left to right diagonal of the cardiovascular autonomic neuropathy (CAN).