A Machine Learning Model for Early Detection of Diabetic Foot using Thermogram Images

Diabetes foot ulceration (DFU) and amputation are a cause of significant morbidity. The prevention of DFU may be achieved by the identification of patients at risk of DFU and the institution of preventative measures through education and offloading. Several studies have reported that thermogram images may help to detect an increase in plantar temperature prior to DFU. However, the distribution of plantar temperature may be heterogeneous, making it difficult to quantify and utilize to predict outcomes. We have compared a machine learning-based scoring technique with feature selection and optimization techniques and learning classifiers to several state-of-the-art Convolutional Neural Networks (CNNs) on foot thermogram images and propose a robust solution to identify the diabetic foot. A comparatively shallow CNN model, MobilenetV2 achieved an F1 score of ~95% for a two-feet thermogram image-based classification and the AdaBoost Classifier used 10 features and achieved an F1 score of 97 %. A comparison of the inference time for the best-performing networks confirmed that the proposed algorithm can be deployed as a smartphone application to allow the user to monitor the progression of the DFU in a home setting.

of DFU [11]. Skin temperature monitoring emerged during the 1970s, with "asymmetry analysis" proving to be very effective in identifying ulcers at an early stage [12]. A temperature difference of 2.22°C (4°F) over at least two consecutive days could be used as a threshold for therapy to prevent DFU [8]. The system correctly identified the development of DFU in 97% of participants, with an average lead time of 37 days [13].
Thermography is a rapid non-invasive imaging technique to quantify thermal changes in the diabetic foot [13]. Several studies have proposed thermogram-based techniques for identifying those at risk of DFU [2,3,14] by identifying a characteristic thermal distribution in the infrared image. The control group had a specific butterfly pattern [15] compared to a large variety of spatial patterns in the patients with diabetes [16,17]. Whilst it is possible to assess thermal changes in one foot compared to the contralateral foot [18][19][20][21] if both feet have thermal changes without a butterfly pattern, then one foot cannot act as a reference. Asymmetry cannot be measured despite a large temperature difference and identical spatial distributions in both feet. An alternative approach is to calculate the temperature change with respect to the butterfly pattern of a control group [22][23][24].
Machine learning (ML) techniques have been widely used for automatic image classification using feature extraction, feature ranking, and using different ML models, such as Artificial Neural Networks (ANN), k-nearest neighbors (KNN), and Support Vector Machines (SVM) [25][26][27]. The change of focus from traditional paradigms in machine learning to Deep Learning (DL) is the product of the high accuracy achieved through its large learning structures, enabling DL to obtain deeper data traits. The need for large data size and high computational complexity can be addressed using transfer learning on pre-trained networks. Whilst it is reasonably straightforward to distinguish the foot thermogram of a control subject with a specific spatial pattern, the distribution in a diabetic foot without a specific spatial pattern is more challenging, especially as the spatial distribution may change and the detection of a temperature rise in the plantar region is important for diabetic patients.
Several studies [22,23,[28][29][30][31][32][33][34][35] have attempted to extract features to identify the hot region in the plantar thermogram, to identify tissue damage or inflammation. Etehadtavakol et al. [35] proposed a method called lazy snapping to extract the extreme temperature areas in the thermogram images which can easily differentiate the coarse and fine-scale change. A thresholding method was used to identify the highest temperature areas from the plantar region [22], while Gururajarao et al. [34] used an active contour model of plantar segmentation and a thresholding method to extract the highest temperature points. Adam et al. [33] used Discrete Wavelet Transformation (DWT) and higher-order spectra (HOS) to derive several coefficients from the characteristics of texture and entropy. A double density-dual tree-complex wavelet transform (DD-DT-CWT) was used to decompose the image and extract several key features [32].
Saminathan et al. [31] segmented the plantar area into 11 regions using region-raising and extracted texture characteristics to classify it into a normal or ulcer group. Most of these works were reported on a small private dataset and utilized post-processing techniques, which might not be able to generialize on a different dataset and the real-time applicability and inference time were not reported. Moreover, the performance of these methods were not comparable to the machine learning based techniques.
Very few studies have applied the deep learning (DL) technique to classify thermogram images from controls and diabetic patients. Maldonado et al. [30] utilized the DL technique to segment the thermogram of the plantar area to classify ulceration or necrosis. Hernandez et al. [23] proposed a quantitative thermal change index (TCI) to measure the thermal change in the plantar region of diabetic patients to classify patients from controls. Hernandez et al. [23,29] utilized the "Plantar Thermogram Database" of 334-foot thermogram images and used TCI to classify subjects into Class 1 to 5 based on the spatial temperature distribution and temperature range. Cruz-Vega et al. [28] also proposed a DL technique to classify the images of the 'Plantar Thermogram Database' into two classes at a time, but the technique is questionable as it cannot be used for clinical decision making and the applicability of such a solution for a smartphone application is not discussed. We have utilized an available dataset to classify control and diabetic groups and developed a novel technique to automatically classify the thermogram images and compared the outcome to a 2D deep learning technique. Moreover, the light architecture and machine learning model are deployable in smartphones.
The major contributions of this paper are:  Comparative evaluation over the state-of-the-art 2D CNN models and image enhancement techniques for the detection of diabetic foot with high accuracy.
 A detailed investigation of the relevant features to improve the detection performance when used as input to traditional classifiers.  An investigation of feature selection and optimization techniques and classification models to maximize detection performance utilizing light classifiers.
Section II discusses the methodology, section III presents the results and discussion and section IV presents the conclusions and proposes topics for future research. Figure 1 shows the complete system block diagram. The thermogram is used as an input to extract important features, feature optimization, and ranking by different ranking techniques. The best combination of the top-ranked features was used as input to the classifier to stratify the thermogram images into diabetic and control groups. The performance of the proposed technique was compared with a 2D CNN-based image classification model for comparative evaluation. Various image enhancement techniques were utilized to enhance the 2D thermogram images and improve the performance of 2D CNN [36]. A database of age, gender, height, and weight and 167-foot pair thermograms from 122 participants with diabetes mellitus and 45 controls was made public by Hernandez-Contreras et al. [29]. Continuous variables were reported with the number of missing data, median, mean, and quartiles (Q1, Q3) for diabetic and control groups ( Table I). The chi-square test was conducted for gender while the rank-sum test was conducted on other features. A p-value <0.05 was used as the cut-off for statistical signficance.

DATASET DESCRIPTION
The foot thermogram images were segmented to remove the background and were also segmented into four angiosomes for the medial plantar artery (MPA), lateral plantar artery (LPA), medial calcaneal artery (MCA), and lateral calcaneal artery (LCA) [37] (Figure 2). The angiosome related information is not only useful to identify the arteries associated with ulceration risk but also shows the local temperature of each angiosome. Pixelated temperature readings for the full foot and the four angiosomes for both feet were available in the dataset, to encounter the problem in two dimensions: pixelated temperature and the 2D thermogram image.

Feature extraction from temperature map
Different features have been extracted by different research groups from foot thermograms over the last decade. Cajacuri et al. [38] highlighted the importance of age, gender and body mass index. Contreras et al. [29] developed the thermal change index (TCI), the mean temperature difference between the corresponding angiosomes from a diabetic patient and a control group as shown in Equation (1).

1
where CGang and DGang are the temperature values of the angiosome for the control and diabetic groups, respectively. Barreto et al. [39] proposed features, such as Estimate temperature (ET), estimated temperature difference (ETD), and hot spot estimator (HSE) for analyzing thermograms, as shown in To calculate these features, the temperature map in the thermogram image is categorized into Finally, the HSE is calculated using ET and Cl values, where Cl is the highest temperature present in the angiosome regardless of its percentage in the histogram. HSE can identify severe DFU. Saminathan et al. [31] have stressed the importance of standard statistical parameters such as mean, standard deviation, and median used in various biomedical applications [40][41][42].
In addition to the above-mentioned features, features which are visually important to distinguish the variation in the plantar temperature distribution were formulated. Five distinct temperature ranges were found in the dataset and verified with the TCI parameters [29].  Rank-sum test 7.8004 P <.0001  Five distinct temperature ranges were classified into normalized temperature ranges (NTR). We have computed the variable NRTclass j which is the number of pixels in class j temperature range over the total number of non-zero pixels, where class j can be class 1 to 5. For the temperature ranges in the class, we have used the same temperature range as reported in [29].

2.2.Classification using thermogram features
Five-fold cross-validation was used in this study, where each fold was divided into a 80 % training and 20 % testing set. 20% of the training data was spared as the validation set. To avoid the issue of an imbalanced training dataset and biased estimates [44], Synthetic Minority Oversampling Technique (SMOTE) [45] was used for training data augmentation. i.e. features correlation more than 95% were removed. Of 39 features, after correlated feature reduction, the number of features became 28. The heatmap of the correlation matrix before and after removing the highly correlated features is presented in Figure 3. The reduced feature set used for further investigation was: age, gender, TCI, highest temperature value, NTR (Class 1-5), HSE, ETD, STD parameters for the different angiosomes, Full Foot, and ET, mean of LPA and LCA.
The shortlisted parameters of the dataset, after optimization, were assessed to take decisions and identify the top features for binary classification. Three different sets of feature ranking were identified using the Multi-Tree Extreme Gradient Boost (XGBoost) [46], Random Forest [47], and Extra Tree [48] techniques. Default parameters were used for the feature ranking techniques to avoid overfitting, a common problem with a large number of features and a limited sample size [49,50]. The best performing top-ranked features from the different feature ranking techniques are used to identify the best combination of features using a rigorous investigation to identify the best combination of features that gave the best performance.
Classifiers: For a detailed investigation, different classifiers such as multilayer perceptron (MLP) [ i.e., the number of neighbors to be compared. Once the paramter "k" is determined, the object's distance is computed with every object available in the dataset and the k-least distances were identified. XGBoost is the streamlined group calculation dependent on GBDT (Gradient Boosting Decision Tree). The principle concept of the boosting calculation is that numerous decision trees perform superior to a single one. LDA is a multi-class classification model, which can be used for dimensionality reduction. Random Forest is an ensemble of Decision Trees that combine the qualities of filter and wrapper methods. Extra Tree is a type of ensemble learning technique which aggregates the results of multiple de-correlated decision trees collected in a "forest" to output it's classification result. It is similar to a Random Forest and only differs from it in the manner of construction of the decision trees in the forest. AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset and adjusted focusing more on difficult cases.
In this experiment, 3 feature selection techniques with 10 machine learning models were investigated with 28 optimized features to identify the best-combined results in 840 investigations.

2.3.Thermogram Image Classification by 2D CNNs
The application of 2D CNNs in biomedical applications is popular for automatic and early detection of abnormalities such as COVID-19 pneumonia [61][62][63], Tuberculosis [64], community acquired pneumonia [65], and many others [66]. As before, five-fold cross-validation is applied, i.e. the dataset is divided into five-folds, and performance metrics were reported for cumulative folds. Overall accuracy and weighted average of Precision, Sensitivity, Specificity, and F1-Score are reported. Since the binary class dataset is not balanced and the number of images in 80% of the dataset (training set per fold) was small. The training dataset was augmented using image rotation and translation [61][62][63][64][65]. The details of the training, validation and testing dataset for 2D binary classification are presented in Table II. Image Enhancement: Image enhancement techniques such as Histogram Equalization (HE) [72], and Adaptive Histogram Equalization (AHE), and Gamma correction [73] can help 2D CNN in classification performances [36]. We have used the AHE technique ( Figure 3  TP is the number of thermograms correctly identified as DM, FP is the number of incorrectly identified thermograms as DM, TN is the number of thermograms correctly classified as CG, and FN is the number of thermograms incorrectly identified as CG. We report the overall accuracy and weighted performance metric, with a 95 % confidence interval (CI), for Sensitivity, Specificity, Precision, and F1 Score. In addition, to compare the computational complexity of the different machine learning techniques, the inference time was calculated for the best performing 2D CNN models and 1D classifiers. The models that can be deployed in a smartphone were also identified.
All the experiments were performed by a computer with the following configuration: CPU Intel i7-10750H @2.6 GHz, GPU NVIDIA GeForce RTX 2070 Super, RAM 32 GB. Matlab 2020a was used for initial pre-processing and scikit-learn and PyTorch were used for classical machine learning and deep learning models, respectively.

Results and Discussion
The experimental results are divided into two sections: The first section presents the foot ulcer detection

Detection results by deep CNN models
The detection results of six deep CNN models for classifying the thermograms into control and diabetic groups from a single foot thermogram without and with image enhancement are presented in Table III and IV; while the data for both feet are shown in Table V and VI, respectively. It can be seen that the original thermograms perform better than the image enhancement techniques (AHE and Gamma), (Table   IV)      We have further investigated whether or not using a combination of foot images improves the detection performance. It was found that the Gamma enhanced dual-foot thermogram has outperformed the other methods (Table V). Interestingly, shallow network MobilenetV2 provides the best performance with an overall 95.81% sensitivity for diabetic foot detection and the class-wise sensitivities are 96.72% and 93.33% for DM and CG, respectively.
The outperformance using a combination of foot images is explained by the fact that combined foot thermograms provide more distinguishable features which are further enhanced by the image enhancement techniques.

Feature-based detection results
We have investigated the performance of the 10 traditional classifiers with the three feature selection techniques and different combinations of optimized features. The summary of the top-performing five combinations is presented in Table VII. It can be seen that the AdaBoost Classifier with Random Forest Feature selection technique and the top 10 features shows the best performance of 96.71% sensitivity for diabetic foot detection and the class-wise sensitivities are 97.75% and 93.85% for DM and CG, respectively which is better than the top performance achieved by the deep CNN models.  The top-performing networks that can be deployed on smart portable devices are shown as Square blocks while Diamond blocks represent non-deployable models.  To the best of the author's knowledge, this is the first detailed investigation for diabetic foot detection using deep CNN models versus traditional machine learning approaches. All possible combinations in terms of classifier and feature selection techniques, along with the ranked features were investigated. As can be seen from Table VII, the Adaboost classifier outperforms other classifiers and the random forest feature ranking technique provides the best feature combination. The top 15 features among 28 features using Random Forest and Extra Tree feature selection techniques, after removing the highly correlated features from the initial 39 features, are shown in Figure 7. It is evident from Table VII   It should be noted that the feature-based classification was done using single foot thermogram features which outperform the dual-foot approach of enhanced image thermogram using deep CNNs. However, in the feature-based approach, demographic information such as age helps to improve its performance as reported in previous work [38]. Peregrine et al. [39] have identified 11 regions of interest (ROI), which can be used to identify the diabetic foot with the help of ET, ETD, and HSE. Figure 8 demonstrating the ROC curves for the top 1 to 10 feature combinations also confirm that the top 10 feature combinations provided the best AUC.
As the hallux/big toe is a prominent region of interest and is in the LPA section of the foot, its contribution in the classification of the foot into diabetic and control is vital in the classification and it is natural to be included in the top 10 features. Age is a strong predictor of the diabetic foot as observed in this study [38]. Minor temperature variation is typically expected in the feet, but less variation, indicated by a lower standard deviation of temperature in LPA and MCA angiosomes, can also be an indicator of the diabetic foot. TCI is also an important indicator as it is a summary of the temperature variation in all angiosomes.
To the best of our knowledge, no previous study has reported an image enhancement effect for the detection of the diabetic foot using thermogram images. Different pre-trained networks with and without image enhancement techniques were investigated and it was found that the image enhancement techniques helped in the classification performance. The best performing Adaboost classifier can be deployed in a smartphone and can be used in the foot clinic and by users in the home setting for the early detection of DFU.
The following interesting observations can be summarized from this study:  Gamma Correction due to its special feature enhancement has helped the network to distinguish the diabetic and control group using the dual-foot thermogram.
 A single-foot thermogram in any CNN-based classification does not improve the classification performance compared with the dual-foot approach.
 Of the various machine learning algorithms tested on the optimized feature sets the Adaboost classifier with random forest feature ranking technique outperforms all other classifiers and the 2D image-based deep learning approach.

V. CONCLUSION
Diabetic foot ulceration has a major impact on morbidity and mortality in patients with diabetes [5].
Early detection may help to limit DFU progression and eventually amputation. The application of artificial intelligence for early detection may have considerable utility for health care professionals, especially in primary care, and for caregivers and patients to keep track of their disease. Such online solutions become more important particularly during pandemic situations where healthcare support is drastically affected due to the burden on the healthcare system. In this study, we propose a classical machine learning-based framework for the early detection of the diabetic foot from thermogram images captured using Infra-Red