Breast Cancer Detection in the IOT Health Environment Using Modified Recursive Feature Selection

,


Introduction
Breast cancer (BC) is the most critical and common disease which greatly a ected ladies in the world according to American Institute for Cancer Research [1], and there were 2 million new cases in 2018.Breast cancer is the 5 th greater cause of women death as compared with other types of cancers.e breast cancer cells are growing abnormally in breast cancer tissues and gradually increased the a ected cell rate, causing breast cancer.e breast cancer is actually a malignant tumor that is developed in breast cells.A group of splitting cells forms a lump or mass of extra tissue which are called tumors, and these tumors can be either cancerous (malignant) or noncancerous (benign).In di erent countries with advanced technology in medical science, the 5-year survival rate of initial phase breast cancer is 80-90% and drops to 24% for breast cancer diagnosis at the initial stage [2].For diagnosis, the breast cancer various invoice-based techniques have been used.In the biopsy technique [3], breast tissues are collected for testing and the results are highly accurate.However, to take a biopsy from the breast is painful for the patient.Another breast cancer diagnosis technique is mammogram [4] which is used for the diagnosis of breast cancer.In this technique, a 2-dimensional (2D) projection image of the breast is designed.However, the mammogram technique does not perform the diagnosis of benign cancer effectively.Another invoice-based technique for the diagnosis of the breast is magnetic reasoning imaging (MRI) [5], which is a very complex test and provides excellent results for 3-dimensional (3D) images and displays the dynamic functionality.
eses invoice-based diagnosis techniques are very complex to conduct, and the results do not effectively and accurately diagnose the breast cancer.Additionally, these techniques required more time to generate the results [6].
In order to resolve these complexities in invasive-based methods for the diagnosis of breast cancer, a noninvasivebased technique such as machine learning technique is more effective and reliable.To classify breast tissues that either be malignant or benign, machine learning techniques have been used in the literature.e related literature of machine learning techniques for the diagnosis of breast cancer has been reported in this study briefly.
Azar and El-Said [4] proposed a technique for the diagnosis of breast cancer.
ey used three classification techniques such as radial basis function (RBF), probabilistic neural network (PNN), and multilayer perceptron (MLP).
ese classifiers were trained and tested with the breast cancer dataset.e performance evaluation metrics such as accuracy, specificity, and sensitivity were used for the classifier performance evaluation.e MLP obtained 97.80% and 97.66% classification accuracy for training and testing, respectively.In another study, Aličković and Subasi [7] proposed a breast cancer prediction system using two Wisconsin Breast Cancer (WBC) datasets along with genetic algorithm (GA) for feature selection algorithm and rotation forest (RF) classifier for classification purposes.
e RF obtained 99.48% classification accuracy on selected features as selected by GA algorithm.Ahmad et al. [8] proposed a diagnosis system GA-MOO-NN for breast cancer diagnosis.
e GA algorithm was used for selecting optimal features.ey split the dataset into three parts: 50% for training, 25% for testing, and 25% for validation.e proposed technique achieved the accuracy of 98.85% and 98.10% individually in the best and average case.Hasan et al. [9] proposed a technique for the diagnosis of breast cancer using symbolic regression of multigene genetic programming.e 10-fold cross-validation was used and obtained 99.28% accuracy.Albrecht Andreas A et al. [10] proposed a technique to diagnose breast cancer and achieved 98.8% accuracy.Pena-Reyes and Sipper [11] proposed a classification technique which used the fuzzy-GA technique and achieved 97.36% accuracy.Akay [12] proposed a breast cancer diagnosis system using the F-score method for features selection and support vector machine (SVM) and obtained good performance results.Zheng et al. [13] used a K-means algorithm for features selection and extraction and combined with SVM for classification of benign and malignant breast tumors.e proposed technique achieved high classification accuracy and low computational time.Madevi [14] used hybridized principal component analysis (PCA) combined with different classifiers and applied to different breast cancer datasets and achieved good accuracy.In [15], the author proposed a technique based on memetic Pareto artificial neural network for the detection of breast cancer.
e experimental results demonstrated that the proposed technique achieved good classification accuracy, and computational time was very low.Marcano-Cedeño et al. [16] proposed a method for breast cancer diagnosis using artificial meta plasticity multilayer perceptron and obtained 99.26% classification accuracy.Liu et al. [17] proposed a breast cancer prediction technique based on decision tree and applied the undersampling technique to balance the training data.e experimental results show that the proposed method achieved very good accuracy.Zheng et al. [13] proposed a breast cancer diagnosis approach based on K-means algorithm and SVM. e K-means was used for feature extraction, and SVM was used for classification.Onan [18] designed an intelligent technique for breast cancer detection.He used fuzzy-rough for selection of an instance and feature selection by consistency.For breast cancer detection, he used the fuzzy-rough nearest neighbor algorithm.Sheikhpour et al. [19] designed a technique based on particle swarm optimization integrated with nonparametric kernel density estimation for breast cancer prediction.Rasti et al. [20] designed a breast cancer diagnosis technique using mixture ensemble of convolutional neural networks and achieved accuracy 96.39%.Ani et al. [21] proposed IOT based patient monitoring and diagnostic prediction tool using ensemble classifier and the system achieved 93% accuracy.Yang et al. [22] proposed an IoT cloud-based wearable ECG monitoring system for smart healthcare, and the proposed system performance was very good in diagnosis of diseases.
e major aim of the article is to propose an IOT-based predictive system based on machine learning to successfully diagnosis people with breast cancer and healthy people.Machine learning predictive model SVM was used for classification of breast cancer in malignant and benign people.e recursive feature selection algorithm (REF) was adopted for the selection of features that improve the classification performance of the SVM classifier.We adopted the REF for appropriate feature selection in this study because the classification performance of REF FS-based method is good as compared with other methods of classification for BC and healthy people.ese works used other feature selection algorithms such as LASSO, MRMR, LLBFS [23], relief with BFO [24], relief [25], and two-stage feature selection method [26].e training/testing splits validation method was used in order to select the best hyperparameters for best model evaluation.Performance evaluation metrics such as classification accuracy, sensitivity, specificity, F1score, Matthews's correlation coefficient (MCC), and model execution time were used to check the performance of the proposed system.e proposed system has been tested on BC dataset which is available at the UCI repository.

2
Wireless Communications and Mobile Computing e important contributions of this research study are as follows: (1) Breast cancer detection in the IoT health environment.
(2) e modified REF algorithm used for feature selection and SVM classifier is trained and tested on selected features.en, performance of SVM was also checked on the full-feature set and compared with the performance on best-selected feature subset at which the classifier achieved optimal performance.(3) Finally, we concluded that the proposed system can be used for effective diagnosis of BC.Furthermore, it can be incorporated easily in the healthcare system for BC diagnosis.
e remaining sections of this article are organized as follows.Section 2 describes the BC dataset, preprocessing techniques, feature selection algorithm REF, and classification algorithm SVM in detail.Furthermore, the validation technique and performance evaluation metrics are also discussed in this section.In Section 3, the BC diagnostic experimental results are analyzed and discussed in detail.Finally, conclusion and future work direction are presented in Section 4.

Dataset.
e dataset "Wisconsin Diagnostic Breast Cancer (WDBC)" was created by Dr. William Wolberg at University of Wisconsin and is available at the UCI machine learning repository [27].It was used as a dataset for implementation of the proposed study for designing machine learning-based system for the diagnosis of breast cancer.e dataset has a size of 569 subjects with 32 attributes and 30 features being real value features.e target output label diagnosis has two classes in order to represent the malignant or benign subject.e class distribution is 357 benign and 212 malignant subjects.us, the dataset is a 569 * 32 feature matrix.

Method Background.
In the following subsections, the background of the proposed method is discussed in detail.

Dataset Preprocessing.
Before applying the machine learning algorithms for classification problems, data processing is necessary.e processed data [28,29] reduced the computation time of classifier and increased the classification performance of the classifier.Methods such as missing value detection, standard scalar, and min-max scalar are widely applied to the dataset preprocessing.Standard scalar ensures that every feature has mean 0 and variance 1; thus, all features have the same coefficient.Min-Max scalar shifts the data in such a way that all features are ranged between 0 and 1. e feature which has an empty value in the row is removed from the dataset.

Modified Recursive Feature Elimination Algorithm (RFE).
e process of feature selection can be perceived as a method for selecting the feature subset from feature available set. e space of the data is very large and subspace/feature selection is critically necessary for the specificity of the data.e feature selection has two advantages.Firstly, it improves the accuracy of the classifier, and secondly due to feature selection, the computation time of machine learning algorithm reduced [6].REF is a feature selection algorithm that fits a model and removes the irrelevant feature or features until the specified number of features is reached.en building a model on features that are remained in the original set.e remaining features set are the most contributing features to the target label.
e recursive feature elimination method for support vector machine [30] can be implemented in the following iterative steps (Algorithm 1).
e recursive feature elimination algorithm procedure is given below.

Classification.
In this study, the following classifier was used for BC and healthy people classification.e brief theoretical and mathematical background of the classifier is presented.
e support vector machine (SVM) is a machine learning algorithm which has been mostly used for classification problems [24,[31][32][33][34][35].SVM used a maximum margin strategy that transformed into solving a complex quadratic programming problem.Due to the high classification performance of SVM, various applications widely used it [6,34,35].In a binary classification problem, the instances are separated with a hyperplane w T x + b � 0, where w is a d-dimensional coefficient vector, which is normal to the hyperplane of the surface, b is the offset value from the origin, and x refers to dataset values.e SVM gets results of w and b.W can solve by introducing Lagrangian multipliers in the linear case.e data points on borders are called support vectors.e solution of w can be expressed as follows: where n is the number of support vectors and y i are target labels to x. e values of w and b are calculated; the linear discriminant function can be written as: e nonlinear scenario, for kernel trick and decision function, can be written as e positive semidefinite functions that obey Mercer's condition as kernel functions [33], such as the polynomial kernel, are expressed as: e Gaussian kernel as expressed as Wireless Communications and Mobile Computing ere are two parameters that should be determined in the SVM model: C and c.

Data Partition.
e dataset was divided into 70% for training the classifier and 30% for validation of the classifier.

Performance Evaluation Metrics.
Evaluation metrics used to evaluate the performance of the classifier.In this study, three performance evaluation metrics were used.Table 1 shows the confusion matrix of the binary classification problem.
According to Table 1, we compute the following metrics and mathematically expressed in equations ( 6)- (10), respectively.
(1) TP (true positive) if the subject is classified as BC (2) TN (true negative) if a healthy subject is classified as healthy (3) FP (false positive) if a healthy subject is classified as BC (4) FN (false negative) if a BC is classified as healthy (1) Classification Accuracy.
e accuracy shows the overall performance of the classification system.Accuracy is the diagnostic test probability that correctly performed.
(2) Sensitivity/Recall.It is the ratio of correctly classified heart patient subjects to all number of heart patient subjects.
(3) Specificity.Specificity shows that a diagnostic test is negative, and the person is healthy.

Proposed Predictive System for Brest Cancer Prediction.
e following are the procedures of the proposed system for breast cancer prediction (algorithm 2).e flowchart of the proposed system is given in Figure 1.

Experimental Results Analysis and Discussion
In this section, we conduct the experiments for breast cancer prediction using feature selection algorithm for appropriate feature selection.e machine learning predictive model SVM has been used for the prediction of breast cancer.e dataset "Wisconsin Diagnostic Breast Cancer (WDBC)" was created by Dr. William Wolberg at the University of Wisconsin and is available at the UCI machine learning repository [27]. is dataset is used in   3.

Classification Results of SVM (Linear).
e SVM (kernel = linear) predictive model performance have been checked for prediction of breast cancer on the fullfeature set and on different selected feature subsets which are produced by REF FS algorithm and tabulated in Table 3. e SVM parameters C = 1 and c = 0.0001 values are used in all our experiments.e performance evaluation metrics are automatically computed and tabulated into Table 4.
e SVM linear predictive model performance on a different combination of feature subset has been reported into Table 4. On one feature set, the SVM linear obtained 76% accuracy, 88% specificity, 56% sensitivity, 70 F1-score, 72 MCC, and 24% classification error, and model computation time is 0.030 seconds.
e performance of the predictive model gradually increases as the number of features increases in the feature set.On 18 numbers of the feature set, the classifier achieved high performance such as 99% accuracy, 99% specificity, 98% sensitivity, 99 F1-score, 1% classification error, and execution time 0.003 seconds.On the other hand linear performance of SVM reduced when the number of features increases in the feature set from 18 to number 30 feature set.
e SVM linear on 30 numbers of features achieved 95% accuracy, 96% specificity, 95% sensitivity, 99 F1-score, 5% classification error, and execution time 4.547 Wireless Communications and Mobile Computing seconds.us, we concluded that on reduced feature set 18, i.e., {F1, F2, F3, F5, F7, F8, F9, F12, F14, F17, F21, F22, F23, F25, F27, F28, F29, F30}, the SVM linear model performance is good, and these features are more appropriate for diagnosis for breast cancer.Figure 3 shows the classification accuracy, specificity, sensitivity on best-selected feature with SVM kernel linear.Figure 4 shows the F1-score on classifier SVM linear on best-selected features.Figure 5 shows the MCC of SVM kernel linear on best-selected features, and Figure 6 shows the execution time of classifier linear on bestselected features.

Classification Results of SVM (RBF).
e SVM (kernel = RBF) predictive model performance has been checked for prediction of breast cancer on the full-feature set and on different selected feature subsets which are selected by REF FS algorithm.
e SVM parameters C = 1 and c = 0.0001 values are used in all our experiments.All the performance evaluation metrics are automatically computed and tabulated into Table 5. e SVM (kernel = RBF) predictive model performance have been checked for prediction of breast cancer on the full-feature set and on different selected feature subsets which are selected by REF FS algorithm and tabulated in Table 3.
Figure 8 shows the F1-score on classifier SVM RBF on bestselected features.Figure 9 shows the MCC of SVM kernel RBF on best-selected features, and Figure 10 shows the execution time of classifier RBF on best-selected features.

Classification Results of SVM (Polynomial).
e SVM (kernel � polynomial) predictive model performance have been checked for prediction of breast cancer on the fullfeature set and on different selected features subsets which are selected by REF FS algorithm.e SVM parameters C � 1 and c � 0.0001 values are used in all our experiments.e SVM polynomial predictive model performances on a different combination of feature subset have been reported into Table 6.On one feature set, the SVM polynomial obtained 64% accuracy, 100% specificity, 20% sensitivity, 33 F1-score, 50 MCC, 36% classification error and model computation time is 0.013 second.
e performance of the predictive model gradually increasing as the number of features increasing in the feature set.On 18 numbers of feature set the classifier achieved high performance such as 97% accuracy, 97% specificity, 97% sensitivity, 97 F1-score, 97% MCC, 3% classification error, and execution time 0.002 seconds.On the other hand, SVM polynomial performance reduced when the number of features increasing in the feature set from 18 to number 30 feature set.e SVM linear on 30 numbers of features achieved 92% accuracy, 92% specificity, 91% sensitivity, 91 F1-score, 92% MCC, 8% classification error and execution time is 0.019 second.us we concluded that on reduced feature set 18 i.e. {F1, F2, F3, F5, F7, F8, F9, F12, F14, F17, F21, F22, F23, F25, F27, F28, F29, F30}, the SVM polynomial model performance is good and these features are more appropriate for diagnosis for breast cancer.Figure 11 Show the classification accuracy, specificity, sensitivity on best-selected feature with SVM kernel polynomial.Figure 12 the F1-score on classifier SVM polynomial on best-selected features.Figure 13 shows the MCC of SVM   14 shows the execution time of classifier polynomial on bestselected features.e graphically demonstrated for better understanding.

Classification Results of SVM (Sigmoid).
e SVM (kernel = sigmoid) predictive model performance has been checked for prediction of breast cancer on the full-feature set and on different selected feature subsets which are selected by REF FS algorithm.
e SVM parameters C = 1 and c = 0.0001 values are used in all our experiments.e SVM sigmoid predictive model performances on a different combination of feature subset have been reported into Table 7.On one feature set, the SVM sigmoid obtained 64% accuracy, 100% specificity, 20% sensitivity, 34 F1-score, 50 MCC, and 36% classification error, and model computation time is 0.006 seconds.e performance of the predictive model gradually increases as the number of features increases in the feature set.On 13 numbers of feature set achieved high performance such as 84% accuracy, 54% specificity, 60% sensitivity, 45 F1-score, 77% MCC, 16% classification error, and execution time 0.005 seconds.On the other hand, SVM, sigmoid performance reduced when the number of features increased in the feature set from 13 to number 30 feature set.e SVM sigmoid on 30 numbers of features achieved 27% accuracy, 45% specificity, 02% sensitivity, 4 F1-score, 22% MCC, and 73% classification error, and execution time is 0.019 seconds.us, we concluded that on the reduced feature set 13, i.e., {F1, F3, F5, F7, F8, F9, F12, F21, F25, F27, F28, F29, F30}, the SVM sigmoid model performance is good, and these features are more appropriate for diagnosis for breast cancer.Figure 15 shows the classification accuracy, specificity, and sensitivity on the best-selected feature with SVM kernel sigmoid.Figure 16 shows the F1-score on classifier SVM sigmoid on best-selected features.Figure 17 shows the MCC of SVM kernel sigmoid on best-selected features, and Figure 18 shows the execution time of classifier sigmoid on best-selected features.8 shows the performance of different SVM kernels on selected feature set.e SVM linear kernel predictive performances are good compared with other SVM kernel RBF, polynomial, and sigmoid.e accuracy of the SVM linear was 99%, which shows the overall performance of the proposed system.e 99% specificity shows that the SVM linear effectively detected the healthy people.Similarly, 98% sensitivity of SVM linear effectively detects the breast cancer people.Furthermore, the F1-score of SVM linear is 98%.e MCC value of SVM linear is 99%.

SVM Different Kernels Performance Comparison on Best-Selected Features. Table
e classification error of Liner SVM was 1%.us, liner SVM-based diagnostic system for breast cancer is very efficient and reliable.e second beast SVM kernel is RBF according to Table 8 and on the reduced feature set SVM RBF achieved 98% classification accuracy, 99% specificity, 98% sensitivity, 98 F1-score, and 97% MCC, and execution time of SVM RBF is 0.004 seconds.
e third best SVM predictive model kernel is polynomial kernel according to     19 for better understanding.e execution time of these four SVM kernels has been shown in Figure 20.

Proposed Method Performance Comparison with Previous
Methods. e performance of the proposed method in term of accuracy is good as compared with previous methods.In Table 9, the proposed method accuracy has been compared with different methods.Table 9 shows that the proposed method achieved high accuracy as compared with other states of the art method.is might be due to appropriate feature selection by FS algorithm.Wireless Communications and Mobile Computing     16 Wireless Communications and Mobile Computing 70% for training and 30% for validation purpose.Additionally, the techniques of performance measuring metrics such as accuracy, sensitivity/recall, and specificity/precision, F1-score, MCC, and execution time were used for model performance evaluation.e Wisconsin Diagnostic Breast Cancer dataset of 32 attributes with 30 real value features and 569 instances available on UC Irvine data mining repository was used for testing of the proposed system.Machine learning libraries in python are used for the implementation and development of the proposed system.e experimental results analysis shows that the proposed system classifies the malignant and benign people effectively.
e improvement in malignant and benign people prediction might be due to various contributions to the BC features.ese findings suggest that the proposed diagnosis system could be used to accurately predict BC and furthermore could be easily incorporated in healthcare.e reduced space of features by REF FS algorithm shows that these are highly important features that diagnose BC accurately as compared with original features space.e classification performance of SVM with different kernels such as linear, RBF, polynomial, and sigmoid was tested on reduced number feature subset 18 is best as compared with full-feature set and on other reduced feature subsets.According to Table 8, SVM kernel-linear performance is best as compared to other SVM kernels such as RBF, polynomial, and sigmoid and SVM linear obtained 99% accuracy, 99% specificity, and 98% sensitivity.e 99% specificity value shows that it is good for the detection of healthy people.Similarly, 98% sensitivity shows that classifier effectively detected BC people.According to REF FS algorithm, the most important features are {F1, F2, F3, F5, F7, F8, F9, F12, F14, F17, F21, F22, F23, F25, F27, F28, F29, and F30}.ese features have great impacts on the classification of BC and healthy people.
e novelty of the study is designed as a system of diagnosis to classify BC and healthy people.e system used the FS algorithm REF, SVM, training/testing splits method, and performance measuring metrics for BC diagnosis.For better diagnosis of breast cancer, machine learning method-based decision support system is more reliable.Furthermore, we know that irrelevant features also degrade the performance of the diagnosis system and computation time increases.Hence, another innovative part of the proposed study used feature selection algorithm to select the relevant subset of features that improve the classification performance diagnosis system.According to Table 9, the performance of the proposed system (REF-SVM) is excellent and achieved 99% classification accuracy as compared with the classification performances of other proposed studies.In the future, other features selection algorithms, optimization, and deep neural network classification methods will be utilized to further increase the performance of the diagnosis system for BC diagnosis.

Begin ( 1 )
Train SVM model on the training dataset (2) Computes the performance metrics values such as accuracy, specificity, sensitivity, F1-score (3) Determine which feature is the least important in making the prediction on the testing dataset and eliminate this feature from the feature set.(4) e model has now reduced its feature by step 1 (5) Select the feature set which gives the highest or lowest scoring metric.(6) Finish ALGORITHM 1: Modified recursive feature elimination.

Figure 1 :
Figure 1: e proposed system for breast cancer detection.

Figure 3 :Figure 4 :Figure 5 :Figure 6 :
Figure 3: Classification performance of SVM linear on different subsets of feature generated by REF FS algorithm.

Figure 11 :
Figure 11: Classification performance of SVM polynomial on different subsets of feature generated by REF FS algorithm.

Figure 20 :
Figure 20: e execution times of different SVM kernels on a best-reduced subset of features.

Table 1 :
[6,12,36,37]trix[6,12,36,37].Experimental Results of REF.To select more suitable features instead of using all the features of the dataset feature, selection algorithms are used for this purpose.e REF feature selection (FS) algorithm is more suitable for appropriate feature selection for predictive model prediction.REF is a feature selection algorithm that fits a model and removes the irrelevant feature or features until the specified number of features is reached.
algorithm and predictive model on data, preprocessing techniques are deployed on a dataset for the betterment of the dataset.Furthermore, all these experimental results are reported in tables and for better understanding, some graphics are also designed.All experiments conducted in python on an Intel(R) Core ™ i5 -2400CPU @3.10 GH, RAM 4 GB, and Windows 10.

Table 2 :
Feature information and description with some statistical measures of Wisconsin Diagnostic Breast Cancer.

Table 3 :
irty different subsets of feature and their ranking created by REF FS algorithm.

Table 5 :
Classification results of SVM (kernel � RBF)-based predictive model on different features subsets created by REF FA algorithm.Figure 7: Classification performance of SVM RBF on different subsets of feature generated by REF FS algorithm.

Table 8 ,
and SVM (kernel � polynomial) obtained 97% classification, 97% specificity, 97% sensitivity, and 97 F1score, and the MCC value is 97%.e execution time is 0.002 seconds.e performance of SVM kernel sigmoid was very low compared with other three SVM kernels and on feature subset 13, the SVM kernel sigmoid obtained 84% accuracy,
Figure 15: Classification performance of SVM sigmoid on different subsets of feature generated by REF FS algorithm.

Table 8 :
Excellent performance metrics results and best SVM kernel on selected feature subset.

Table 9 :
Proposed study classification performance and results of other previously proposed methods.