A Comprehensive Review on Medical Diagnosis Using Machine Learning

: The unavailability of sufficient information for proper diagnosis, incomplete or miscommunication between patient and the clinician, or among the healthcare professionals, delay or incorrect diagnosis, the fatigue of clinician, or even the high diagnostic complexity in limited time can lead to diagnostic errors. Diagnostic errors have adverse effects on the treatment of a patient. Unnecessary treatments increase the medical bills and deteriorate the health of a patient. Such diagnostic errors that harm the patient in various ways could be minimized using machine learning. Machine learning algorithms could be used to diagnose various diseases with high accuracy. The use of machine learning could assist the doctors in making decisions on time, and could also be used as a second opinion or supporting tool. This study aims to provide a comprehensive review of research articles published from the year 2015 to mid of the year 2020 that have used machine learning for diagnosis of various diseases. We present the various machine learning algorithms used over the years to diagnose various diseases. The results of this study show the distribution of machine learning methods by medical disciplines. Based on our review, we present future research directions that could be used to conduct further research.


Introduction
The outcome of a treatment could be affected due to mistakes made by clinicians in the diagnosis of a patient [1].In a scenario of a diagnostic error, inappropriate treatment could be given where the patient would be deprived of the necessary care.Often the doctors are distracted by the features that seem important at the time, and thus make diagnostic errors [2].The surrounding environment, and the tools used for diagnosis, can also lead to diagnostic errors [3].All these CMC, 2021, vol.67, no.2 mentioned factors could contribute to a significant adverse effect on the patient's health, increase the overall medical expenditures, and cause psychological discomfort [4].
To enhance healthcare services, including minimization in diagnostic errors and providing patients with proper treatments, machine learning (ML) methods are being used to assist the clinicians in making decisions.Moreover, a vast increase in the availability of medical data and technological advancements to store and process the data is another reason that has led to the adoption of ML in healthcare.ML allows computer systems to learn from past experiences, and thus improves the overall efficiency of the programs [5].The Mathematical model uses past experience for the optimization of the parameters that infers from the representative data [6].This use of past data is different from conventional approaches in which a programmer explicitly programs the entire model using rules.Precisely, in ML, the model is fed with training data consisting of both the predictors (i.e., the input features) and target (i.e., the output).The model then learns the mapping from predictors to the target and generalizes it to perform even on unobserved data.This process of disease diagnosis using ML is represented in Fig. 1.

Figure 1:
The process of applying machine learning in disease ML is used in the treatment of patients [7], as well as prognosis and diagnosis [8].It is used in various tasks in the medical disciplines such as drug discovery [9], assisting surgeons in complex surgeries [10], to interpret medical images with high efficiency [11], and even providing an alternate opinion using electronic health records (EHR) for predictions [12].Although ML can provide results with high accuracy in performing various medical tasks, including medical diagnosis, and assisting clinicians in making informed decisions, however, it cannot replace a human clinician or a doctor.
The significant contribution of this paper includes a comprehensive review of the use of ML in medical diagnosis.Various ML algorithms used in medical diagnosis are described.The results of this study provide trends in diagnosis of various diseases using machine learning, and challenges, and future research directions.This paper is organized as follows.Section 2 provides a review of ML used in diagnosis and an overview of various ML methods.In Section 3, discussion and future research directions are provided.The conclusion is presented in the final section.

Machine Learning in Medical Diagnosis
Various ML methods have been used for diagnosis of various diseases.The following section summarizes articles with respect to ML methods.

Artificial Neural Network (ANN)
ANN is inspired by the human brain structure that uses neurons for processing [13,14].The simple structure of ANN is represented in Fig. 2. ANN can be used to solve the complex mathematical task, a large signal processing, or even parallel computations [15,16].

Figure 2:
An overview of a simple artificial neural network.The green node is the output node that calculates the summation of products of weights (i.e., w1, w2, and w3) and inputs (i.e., x1, x2, and x3).The activation function is then applied to produce the output Due to the complexity in identifying symptoms of urinary tract infection (UTI), Ozkan et al. [17] developed a model using ANN to improve the diagnostic performance of UTI.This system could classify between cystitis and urethritis using clinically available data.Moreover, they reported that invasive and costly methods can be avoided using ML methods.
Diagnosing pediatric traumatic brain (TBI) injury is challenging.Therefore, Chong et al. [18] attempted to study the feasibility of using ML for moderate to severe prediction of TBI.According to their study, head injury mechanism and clinical data can be used as input to develop a feasible ML model for diagnosis of TBI injury.
Diarrhea is one of the leading causes of death worldwide, Abubakar et al. [19] presented a model to predict the incidence of diarrhea.The proposed method is based on ANN and it can be helpful in prevention of diarrhea.This system has achieved an accuracy of 95.63%.

Deep Artificial Neural Network (Deep ANN)
Deep ANN model is represented in Fig. 3.This model learns through multiple levels of representation to find complex relationships among data [20,21].'Deep,' in deep ANN, stands for consecutive layers of representation [22].Asaoka et al. [23] compared various ML methods to classify between healthy and unhealthy glaucoma patient.This study suggested that early detection of glaucoma is possible using deep feed forward neural network (FNN) with the area under curve (AUC) value of 92.6%.
It is essential to prioritize between critically ill and stable patients for emergency departments.Raita et al. [24] developed a deep learning (DL) based model to predict the critical care outcome.The model uses clinical and demographic data for better optimization of resource utilization.This model has achieved better performance as compared to the traditional emergency severity index used for prioritization of patients.
Wang et al. [25] provided a model to predict renal dysfunction.The proposed model can minimize the chances of adverse consequences with early prediction of renal dysfunction.
The diagnosis of any disease affecting the cardiovascular system is a complex task.In this context, Elsayad et al. [30] designed a system to assist physicians in reducing errors in the diagnosis of the cardiovascular diseases.This system is based on BC and it is more effective as compared to SVM.

Classification and Regression Tree (CART)
CART algorithm can handle raw data that can process continuous and nominal attributes as targets [31].It can be used to avoid overfitting [32] and handle multivariate datasets [33].It can also improve classification accuracy [34].
Maghooli et al. [35] studied the use of the CART algorithm to classify between different classes of Erythmato-Squamous Diseases (ESD).The results of the study were compared with other state-of-the-art methods and showed significant accuracy using CART.
Aljaaf et al. [36] examined various ML models for the early prediction of chronic kidney diseases.The authors coupled ML with predictive analytics to determine useful predictors.From their study, out of the 24 predictors, they found that 30% of them useful.Hence, they concluded that predictive analytics could be beneficial in diagnosis along with ML.

Convolution Neural Network (CNN)
CNN is a DL model [22].It is specially designed to work with images [37].It can recognize handwriting [38], handle small datasets as well as overfitting [39], or can even deal with noisy images [40], etc.An overview of a CNN structure is represented in Fig. 4.
Pneumonia is one of the leading causes of death among children.Nahid et al. [41] proposed a model that uses image processing methods and CNN on X-rays to detect pneumonia.This model achieved 97.92% accuracy.In another study, Stephen et al. [42] proposed the diagnosis of pneumonia by designing CNN model.They used data augmentation methods instead of transfer learning to obtain a large amount of training data.They claimed that the resulting method achieved noteworthy accuracy.Diagnosing esophageal cancer at an early stage is necessary as its prognosis is complex.Therefore, Horie et al. [43] exhibited the ability of CNN in diagnosing it.As an input to the model, they made use of endoscopic images and achieved 98% accuracy in diagnosing cancer at an early stage.
Islam et al. [44] designed a CNN based model to diagnose alzheimer's disease.The proposed model could perform the multi-class classification of alzheimer's disease.The authors stated that this model could help in the early diagnosis of alzheimer's disease and it could also prevent damage of brain tissue in patients.
Kong et al. [45] proposed a model for early detection of acromegaly disease.The proposed model uses CNN approach and it has achieved remarkable results with high sensitivity and high specificity.In the literature, it is reported that the identification of nail diseases is challenging.Therefore, Nijhawan et al. [46] proposed a CNN based model that could diagnose nail diseases with 84.58% accuracy rate.The proposed model could differentiate between thirteen different nail diseases using nail images.COVID-19 has been declared as an epidemic disease of the year 2020.As of when this paper is being written, no vaccine has been found to cure it.Therefore, early identification of COVID-19 becomes necessary.In this regard, Elaziz et al. [47] proposed a CNN based method to detect COVID-19 using chest x-ray images.The proposed method could help in its early diagnosis with comparatively less cost.

Deep Convolution Neural Network (Deep CNN)
With a deeper network, at each layer, the network learns new patterns of the input images.Thus, it increases its applicability in medical imaging [48,49].
Periodontitis is a common dental disease arising due to poor dental hygiene.Krois et al. [50] applied deep CNN on dental radiographs to detect periodontal bone loss.Due to the high CMC, 2021, vol.67, no.2 complexity in detection of periodontitis, the authors conclude that ML-based models could minimize the efforts.
Murakami et al. [51] developed a novel method to identify bone erosion in rheumatoid arthritis patients.The method-using deep CNN was able to detect even fine lesion changes; thereby it may assist radiologists in finding changes in radiographs.

Decision Tree (DT)
The decision tree is represented in Fig. 5a.It is a predictive technique that derives conclusions (in leaves) from observations (in branches) [52,53].Although DT cannot handle missing, inappropriate features, or even uncertainties in its conventional form but it could still overcome using certain extensions to it [54].It is a well-known fact that the treatment of the thyroid is a long term process.Ionita et al. [55] analyzed the efficiency of ML methods to diagnose and classify the thyroid.DT provided an accuracy of 97.35% on clinical data.The authors compared the results with other ML methods and found DT to be the most efficient method.

Gradient Boosting (GB)
GB is a predictive modeling method formed with an ensemble of weak predictors [56].It uses the idea of converting weak learners into good ones [57].
Than et al. [58] improved the diagnosis of myocardial infarction using a proprietary algorithm that incorporates GB.They reported that variations in concentrations of cardiac troponin based on age, sex, and time could improve the risk assessment for patients.

eXtreme Gradient Boosting (XGBoost)
XGBoost is based on gradient boosted DT [56].It is based on an ensemble learning technique.They are currently dominating most of the other ML algorithms due to their ability to scale efficiently using minimum resources [59][60][61].
Taylor et al. [62] developed a model to predict the urinary tract infection (UTI).Conventional methods take more than 24 hours to give a complete report.Thus, ML based models are required for automated diagnosis of UTI.This work uses XGBoost to accurately diagnose UTI.Tian et al. [63] proposed a method using XGBoost for Hepatitis B surface antigen (HBsAg) seroclearance prediction.Clinical and demographic data are used in this model.This work has achieved better performance as compared to other ML methods.
With increasing mortality among patients with COVID-19 as of mid-2020, it becomes significant to diagnose patients at high risk.Yan et al. [64] proposed an XGBoost based method to predict a high-risk patient.The proposed method could predict mortality 10 days in advance with more than a 90% accuracy rate.

Random Forest (RF)
RF uses the ensemble method.It constructs multiple DT and then outputs the result using a majority voting technique in case of classification or takes the average of all DT in case of regression [65].Fig. 5b presents a random RF.It can avoid overfitting and achieve a higher accuracy rate due to its randomness [66].
Ganggayah et al. [67] proposed a method using RF to identify various factors for breast cancer survival rate.The method could identify various factors such as the stage of cancer and the size of the tumor.This work has achieved 82.7% accuracy and it is better as compared to the conventional methods.
Zou et al. [68] studied and compared various ML methods for prediction of diabetes.Diabetes might cause several complications in patients.Thus, the study used principal component analysis (PCA), minimum redundancy, and maximum relevance (mRMR) for feature reduction and RF method for classification.The method achieved the best accuracy rate as compared with other ML methods.An attempt was made by Samant et al. [69] to diagnose diabetes using retinal images.Their proposed novel model achieved a significant accuracy rate.
Wu et al. [70] developed a model to predict and classify fatty liver disease (FLD).The developed model could assist physicians in classifying and managing high-risk patients of FLD.This model used RF for classification and prediction with minimum inputs.Leha et al. [71] studied the effectiveness of using ML methods for prediction of pulmonary hypertension (PH).This is an effective method for prediction of PH.The authors stated that the use of ML could require minimum clinical and echocardiographic parameters as compared to conventional methods for prediction.

Support Vector Machine (SVM)
SVM is represented in Fig. 6.As many hyperplanes are available, the main aim of SVM is to find a hyperplane that has a maximum distance between the features of the present classes [72][73][74].Although SVM is a supervised algorithm but in the case of unlabeled data, Support Vector Clustering could be used [75].Due to the complexity of bone structures, Singh et al. [77] proposed a diagnostic system to automatically detect osteoporosis.Based on SVM, this system could be inexpensive and this kind of system is largely available than conventional methods.As the time complexity of this system is reported as low, it could be used in real-time applications as well.
Kidney disease in later stages could result in its failure gradually.In this regard, Ahmad et al. [78] developed a system that could help in the early diagnosis of kidney disease.The system is built using SVM and it could improve the decision making process of determining the chronic condition of kidney disease.Sady et al. [79] introduced a technique that uses the SVM method with RBF kernel to predict mortality in patients having chagas disease.Their study finds that use of features extracted from time-frequency and symbolic series of heart rate variability along with clinical data are good predictors of death.
Hsu et al. [80] proposed a ML-based model to identify stenosis of the extracranial and intracranial arteries.The proposed SVM model achieved better accuracy and sensitivity, as compared to conventional methods.
Influenza incurs high spending because of the high rate of false positives of the tests.To minimize the high expenses, Marquez et al. [81] proposed the first model for prediction of influenza.The proposed model uses the SVM model and achieves an accuracy rate of more than 90%.
Hameed et al. [82] developed a system to automatically classify skin lesions.The developed system used quadratic SVM that could classify between six classes of skin lesions.Additionally, the proposed model achieved an accuracy rate of 83%.
As the chances of mortality can increase due to cardiovascular diseases, it becomes necessary to diagnose it in early stages.Louridi et al. [83] proposed a method for improving the accuracy of heart disease prediction.The authors conclude that using mean values in place of missing values improves the overall performance of the model.In another article, written by Ali et al. [84], a novel method was introduced to predict heart failure.This novel method uses stacked SVMs and achieved 3.3% of higher accuracy as compared to traditional SVM.
Metabolic syndrome poses a high chance of prevalence of diabetes mellitus and heart diseases.Therefore, Alavijeh et al. [85] proposed a method for prediction of the metabolic syndrome.The proposed method is the first one that uses SVM to achieve better results as compared to DT.

Other/Hybrid
Augmenting two or more machine learning algorithms is called a hybrid algorithm.They can solve problems that the individual algorithm cannot do and these hybrid models can achieve remarkable results [86].Other ML algorithms like GB [56,57], latent dirichlet allocation (LDA) [87], long short-term memory (LSTM) [88], recurrent neural networks (RNN) [89], NB [90], genetic algorithm (GA) [91], particle swarm algorithm (PSA) [92], and logistic regression (LR) [93] can also provide better results.We urge our readers to go through these algorithms in the case they have no idea about them.
Aphasia affects one's ability to read or write and, thus, is a resource-intensive condition.To improve the diagnostic process for aphasic speech evaluation, Kohlschein et al. [94] designed a system for automatically detection and evaluation of aphasic speech.This system uses LSTM-RNN and its performance is very good.Bhattacharya et al. [95] described that the ML methods effectively identify hypertrophic cardiomyopathy patients with ventricular arrhythmias and thus this identification mechanism reduces the risk of cardiac death.They proposed an ensemble-based model using LR and NB, which addressed the imbalanced data effectively.They conclude their model has achieved considerable better performance as compared to conventional methods.Dhahri et al. [96] optimized a learning algorithm to classify the stages of breast tumors.According to the authors, the GA could be applied successfully for optimization and identification of best classifier.Senthikumar et al. [97] developed a novel method for prediction of cardiovascular diseases.Based on clinical data, the model uses a hybrid model comprising RF and a linear model.With an 88.7% accuracy rate, the performance of the model was observed to be improved as compared to other models based on ML and soft learning.
Due to the difficulty in recognition of blast cells, Boldu et al. [98] developed a LDA based system that could differentiate blast cells with high precision.The authors concluded that the proposed system could help in diagnosis of leukemia.
Dengue fever has similar symptoms as those of other types of fever, and thus it becomes necessary to diagnose it in early stages.Gambhir et al. [99] developed a PSO-ANN based model for early prediction of dengue fever.The efficiency of the proposed model was higher than the other state-of-the-art ML methods.This study is done to access the influence of ML in medical diagnosis.A comprehensive review of research papers in this study provides an overview of the way ML is being used to diagnose various diseases.The articles considered for review in this study are from the year 2015 to the year 2020.To understand the impact of ML in medical diagnosis over the years, Different diseases are distributed by year and this distribution is presented in Tab. 1. From Tab. 1, it can be seen that ML has been applied to various diseases from the year 2015 to 2020.Out of 44 articles, we observe that 4 articles, i.e., 0.09%, were published on cardiovascular diseases.Moreover, 2 out of Moreover, we observe certain limitations in building ML models for disease diagnosis.The problems are like finding the best ML model, the use of nonstandard datasets etc.The findings of this study show that the researchers tend to concentrate on combining related diseases in a single system.For example, separate studies were conducted for diagnosis of COVID-19 [47] and pneumonia [41,42].However, there is a relation between these two diseases [100], So ML methods could be used to develop a single model that could diagnose both the diseases, i.e., COVID-19 and pneumonia.This could largely benefit patients suffering from both the diseases simultaneously.
One of the important parts of any ML method is the data.Currently, due to non-uniformity in data collection and storage across geography, the developed models could give varying accuracies on data collected from different sources.It is also observed that the most of the researchers point out that their developed model needs further validation on unbiased datasets.This issue could be addressed through data standardization and data normalization processes [101].
DL models such as CNN [22,[37][38][39][40] have outperformed traditional ML models [102] and shallow neural networks on unstructured data (e.g., on image data in reference [41,44]).But, when it comes to structured tabular data, they have been outperformed by both, i.e., traditional ML models and shallow neural networks.However, recent studies have shown that the DL achieved comparable or better results to that of boosting methods [103,104] and they can also synthesize tabular data [105,106].Arik et al. [107] have proposed a DL method for tabular learning called TabNet.With the number of benefits posed by DL, one of the crucial research areas is to use it on tabular data for disease diagnosis.

Conclusion
ML is used in medical diagnosis for reduction in the overall cost of medical expenditure, and as a 'second' opinion for doctors.The use of mathematical models could be used to make decisions.The primary aim of using ML in medical diagnosis is to improve the accuracy with which a disease is detected.This comprehensive study stresses the use of ML for effective medical diagnosis.It is observed that over the years the use of ML in medical diagnosis has increased substantially.The limitation of this study is that only articles applying machine learning in medical diagnosis are considered.Articles of artificial intelligence domain that are using techniques other than ML, such as fuzzy logic are not considered.Another limitation is that only articles from the year 2015 to mid of 2020 are considered.

Figure 3 :
Figure 3: Deep artificial neural network (deep ANN).The difference between ANN and deep ANN is in the number of hidden layers.As deep ANN has a greater number of hidden layers, it is more generalizable comparative to simple ANN

Figure 4 :
Figure 4: An overview of a typical convolution neural network (CNN).It consists of two parts: Feature learning and Classification.Feature learning consists of multiple layers of convolution and pooling (also called subsampling layer) layers.The Classification consists of flatten layer, and one or more layers of a fully connected neural network.The last layer is the output layer in multiclass classification, in which the green circle is the output predicted by the algorithm

Figure 5 :
Figure 5: (a) An example of a decision tree (DT).To predict a class label in DT we start from the root of the tree.On the basis of the comparison with the root node, we follow the branch corresponding to the outcome from the comparison.Similarly, we navigate through the entire tree until we reach the leaf node that predicts the outcome.(b) An example of a random forest (RF).It consists of many decision trees constructed on the data sample.It gets the result from all the decision trees and then selects the one based on majority voting

Figure 6 :
Figure 6: Support Vector Machine (SVM): (a) shows a random number of hyperplanes (b) SVM chooses optimal hyperplane with a maximum distance between features of both the classes

Table 1 :
The distribution of diseases by year

Table 2 :
CMC, 2021, vol.67, no.2 2007 44 eligible articles, i.e., 4.55%, were published on breast cancer, COVID-19, diabetes, glaucoma, pneumonia, and urinary tract infection.From the rest of the articles, every article concentrated on one disease each.Tab. 2 shows the distribution of ML methods based on medical disciplines.The distribution of diseases based on ML methods is shown in Tab. 3. The distribution of machine learning methods by medical disciplines

Table 3 :
The distribution of diseases by machine learning methods