Application of uncertainty quantification to artificial intelligence in healthcare: A review of last decade

Uncertainty estimation in healthcare involves quantifying and understanding the inherent uncertainty or variability associated with medical predictions, diagnoses, and treatment outcomes. In this era of Artificial Intelligence (AI) models, uncertainty estimation becomes vital to ensure safe decision-making in the medical field. Therefore, this review focuses on the application of uncertainty techniques to machine and deep learning models in healthcare. A systematic literature review was conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Our analysis revealed that Bayesian methods were the predominant technique for uncertainty quantification in machine learning models, with Fuzzy systems being the second most used approach. Regarding deep learning models, Bayesian methods emerged as the most prevalent approach, finding application in nearly all aspects of medical imaging. Most of the studies reported in this paper focused on medical images, highlighting the prevalent application of uncertainty quantification techniques using deep learning models compared to machine learning models. Interestingly, we observed a scarcity of studies applying uncertainty quantification to physiological signals. Thus, future research on uncertainty quantification should prioritize investigating the application of these techniques to physiological signals. Overall, our review highlights the significance of integrating uncertainty techniques in healthcare applications of machine learning and deep learning models. This can provide valuable insights and practical solutions to manage uncertainty in real-world medical data, ultimately improving the accuracy and reliability of medical diagnoses and treatment recommendations.


Introduction
Artificial intelligence (AI) has emerged as a promising technology with significant potential to transform the healthcare industry.AI technologies such as machine learning, natural language processing, and computer vision can analyze vast amounts of patient data and provide valuable insights to healthcare professionals.The use of AI in healthcare has the potential to revolutionize the way in which healthcare is delivered, improving patient outcomes, reducing costs, and increasing access to care.AI can assist healthcare providers in making more accurate diagnoses, predicting outcomes, and developing personalized treatment plans for patients.Additionally, AI-powered tools can help healthcare providers identify early warning signs of diseases and conditions, enabling early intervention and prevention [1].This can greatly enhance the efficiency and effectiveness of healthcare delivery, ultimately leading to better health outcomes for patients.
Despite the promising potential of AI in healthcare, there are also concerns regarding privacy, security, and ethical considerations.As such, it is important to carefully consider the benefits and risks associated with the use of AI in healthcare and to ensure that the technology is deployed ethically and responsibly.Indeed, the 'black box' nature of these AI systems has raised concerns about their reliability and accountability [2].The inner workings of these models are often not comprehensible to end-users, and even data scientists may struggle to interpret the algorithm [2].This entire scenario makes it challenging for end-users to trust the AI system they are interacting with, potentially leading to skepticism or even rejection [2].
In response to this need for transparency and trust, the emerging field of explainable AI (XAI) employs techniques to enhance the interpretability of AI models [2].XAI techniques are effective in uncovering the 'black box' aspect of machine learning models and providing explanations for the decisions they make [2].However, while these techniques can improve the interpretability of AI models, they do not address the practical assessment of decision reliability [3].Furthermore, XAI techniques do not capture the AI models' overconfident predictions and vulnerability to adversarial attacks [4], which can lead to user uncertainty about AI system prediction.
To ensure safety and reliability [5], it is crucial to evaluate the uncertainty of AI system predictions.The concept of uncertainty pertains to the level of confidence or ambiguity in the predictions generated by these models and can result from a variety of factors such as incomplete or noisy data, limited domain knowledge, or inherent randomness in the system, making it a crucial consideration in ensuring the reliability, interpretability, and safety of AI models.Providing uncertainty estimates in AI systems is essential for ensuring safe decision-making in high-risk domains characterized by diverse data sources, as seen in remote sensing [6].Uncertainty estimates are also critical in domains where the nature of uncertainty is an essential part of the training methods, such as in active learning [7] and reinforcement learning [8].By incorporating strategies for quantifying and communicating uncertainty in AI systems, we can enhance their effectiveness and foster greater trust in their predictions.
Predictive uncertainty is a widely used technique concerned with the uncertainty associated with making predictions or estimates using a model.It quantifies the level of confidence or reliability in the model's predictions for new or unseen data points.The most common approach for estimating predictive uncertainty involves modeling the uncertainty caused by the model itself (model or epistemic uncertainty) separately from the uncertainty caused by the data (data or aleatoric uncertainty) [9].Aleatoric uncertainty is an intrinsic property of the data distribution [4] and arises in situations with a large amount of data that are not informative [10] or incomplete, noisy, conflicting, or multi-modal [11].On the other hand, epistemic uncertainty occurs due to insufficient knowledge, a poor representation of the training data, or flaws in the model itself, leading to uncertainty about the model's behavior or performance in new or unseen situations [4].While model uncertainty can be reduced by improving the architecture, learning process, and training data quality, data uncertainties are irreducible [4].
Predictive uncertainty can be classified into three main groups based on predictive uncertainty: in-domain uncertainty [12], domain-shift uncertainty [13], and out-of-domain uncertainty [14,15].In-domain uncertainty refers to input data that is assumed to be drawn from the same distribution as the training data.This type of uncertainty arises when the model is unable to accurately predict an in-domain sample due to a lack of relevant knowledge.Additionally, design inaccuracies in the model can also contribute to in-domain uncertainty [12].Domain-shift uncertainty [13] describes the uncertainty associated with input data that is drawn from a distribution that is shifted from the training distribution.This shift can be caused by poor representation of training data changes in real-world circumstances [13].This shift may increase uncertainty because the deep model may struggle to explain the domain-shifted data based on the seen data used for training.Out-of-domain uncertainty [14,15] refers to the uncertainty associated with an input extracted from the subgroup of unknown data, wherein the distribution of unknown data is dissimilar and far from the distribution of training data.This type of uncertainty arises when the deep model is unable to explain an out-of-domain sample due to its lack of knowledge of the out-of-domain data [4].
As a result, model uncertainty encompasses what the deep model does not know due to the lack of in-domain or out-of-domain knowledge.This includes in-domain, domain-shift, and out-of-domain uncertainties.In contrast, data uncertainty only includes in-domain uncertainty caused by the nature of the data used to train the model [4].Uncertainty can be introduced in healthcare in various ways (Fig. 1), for example: -Variability in measurements: Medical measurements such as blood pressure, heart rate, and oxygen saturation can vary due to various factors such as measurement noise, biological variability, and measurement error.-Incomplete or missing data: Medical data collected from patients may be incomplete or missing due to various reasons such as incomplete medical records, data entry errors, or patient noncompliance.-Uncertainty in medical diagnosis: Medical diagnosis involves making decisions based on incomplete information and subjective interpretation of medical data, which may introduce uncertainty in the diagnosis.-Uncertainty in medical treatment: Medical treatment involves making decisions based on uncertain outcomes and potential side effects, which may introduce uncertainty in the treatment process.
This review paper provides a comprehensive overview of uncertainty estimation in healthcare.The paper reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities.In addition to providing a general outline of uncertainty quantification methods applied in the machine and deep learning models, the paper also discusses the most prevalent, emerging, and technically promising techniques in this research field.

Methods
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were closely followed to select the most relevant articles on uncertainty estimation methods applied to healthcare, using traditional machine learning and advanced deep learning models.

Related reviews
The topic of uncertainty is highly relevant in the field of data analysis, and several reviews have recently been published on the subject.However, these reviews have limitations in terms of their scope and focus: • Broekhuizen et al. [16] "A Review and Classification of Approaches for Dealing with Uncertainty in Multi-Criteria Decision Analysis for Healthcare Decisions".In this review, the authors discuss techniques for estimating uncertainty in multi-criteria decision analysis for healthcare decisions, without focusing on machine and deep learning approaches or considering only medical data.-Lambert et al. [17] "Trustworthy clinical AI solutions: a unified review of uncertainty quantification in deep learning models for medical image analysis".Here the authors focus only on deep learning approaches, neglecting machine learning ones.Moreover, the focus is only on medical images.-Loftus et al. [18] "Uncertainty-aware deep learning in healthcare: A scoping review".The authors evaluate methods for quantifying uncertainty in deep learning for healthcare applications but analyze relatively few studies (around 30 papers).-Gawlikowski et al. [19] "A Survey of Uncertainty in Deep Neural Networks".The authors provide a comprehensive review focusing   only on deep neural networks, with a partial focus on medical images.-Authors in Ref. [20] investigated UQ techniques in AI models and provided overview without delving into individual study nuances or explicitly distinguishing between ML and DL methods.
Thus, there is a need for a comprehensive review that covers both machine learning and deep learning approaches and analyzes all types of medical data, including physiological signals and medical images.This review aims to provide an overview of uncertainty quantification techniques applied in healthcare, with a focus on both machine learning and deep learning frameworks.Fig. 2 shows how our review integrates the existing literature reviews, providing an overview of all the works on uncertainty estimation in healthcare.

Search strategy
Only articles published in the last decade (2013-2023) were included in this review.The appropriate journal articles were searched through the Institute of Electrical and Electronics Engineers (IEEE), Google Scholar, PubMed, and Scopus scientific repositories.For the retrieval of articles focusing on machine learning and deep learning, the Boolean search strings such as "Uncertainty estimation", "Human healthcare", "Signals", "Images", "Machine learning" and "Deep learning" were used in various combinations from Scopus and PubMed scientific repositories.Two distinct searches were performed: one focusing on uncertainty estimation methods based on machine learning, and the other on those based on deep learning.The search was conducted between September 2022 to January 2023.

Study selection and quality assessment
A total of 424 articles and 553 articles were identified, respectively, using Boolean search strings for machine learning-based methods and deep learning-based methods.About 74 (ML) and 96 (DL) duplicate and irrelevant articles on deep learning were eliminated wherein articles on 'animal health' or 'model explainability'.Theses, books, and abstracts were also excluded.Thereafter, studies were included if they met the following criteria: (i) They described uncertainty estimation methods used in healthcare involving human data (images/signals), (ii) They described uncertainty estimation methods used in healthcare, based on machine learning or deep learning models, (iii) They were published between the years 2013 and 2023, (iv) They were published in a peer-reviewed journal, (v) They were published in English.
Articles were excluded if they were: (i) not written in English, (ii) a review article or pilot study, (iii) an abstract or a book chapter, (iv) too similar to other studies, (v) published before 2013, or (vi) not available in full text.After careful examination, 312 articles for machine learning and 350 articles for deep learning were excluded based on the aforementioned criteria.The final selection yielded 38 articles for machine learning and 107 articles for deep learning, focusing on uncertainty estimation methods in healthcare.Table 1 provides a summary of these articles, and Fig. 3 depicts the utilization of the PRISMA guideline in article selection for this review.

Uncertainty quantification in machine learning
Effective management of uncertainty is a crucial factor in medical decision-making, particularly in the context of diagnostic procedures.Table 2 illustrates the distribution of works based on the employed method for uncertainty management in machine learning approaches.According to the research papers, it can be concluded that the most utilized algorithms for uncertainty quantification are: (i) Bayesian inference: Bayesian inference is a statistical inference technique that leverages Bayes' theorem [21] to combine prior knowledge of a model with observed data for analysis.It interprets probabilities as degrees of belief and allows for the estimation and management of uncertainty in the estimates.(ii) Monte Carlo simulation: Monte Carlo simulations predict system outcomes, aiding risk assessment and decision-making [22].These simulations employ random sampling algorithms to address deterministic problems, distinguishing them from other approaches.(iii) Fuzzy systems: Fuzzy logic is a powerful approach for handling uncertainty in machine learning models.Neuro-fuzzy inference system (ANFIS) is an advanced method integrating fuzzy logic and neural networks to model uncertainty [23].It combines fuzzy "IF-ELSE" rules and optimal parameters from the fuzzy algorithm to learn non-linear functions.ANFIS's architecture includes five layers for fuzzification, rule generation, normalization, and output generation.(iv) Dempster-Shafer's theory (DST): DST is an extension of Bayesian theory [24].DST aimed at addressing its limitations, such as the inability to represent ignorance and consider multiple hypotheses.DST, as a theory of evidence, integrates all potential outcomes rather than analyzing individual pieces of evidence.(v) Rough set theory (RST): RST manages uncertainty and inconsistency using an approximation space defined by upper and lower approximations [25].These approximations can be crisp or fuzzy sets, making RST a fundamental theory in addressing uncertainty.(vi) Imprecise probability: Imprecise probability, a broader concept than traditional probability, allows for estimating uncertainty [26].Multiple theories exist, including subjective probability and consistent lower prediction, which offer different approaches to modeling imprecise probability.

Related works based on Bayesian inference
In recent years, Bayesian inference has emerged as a versatile and powerful tool for addressing various scientific challenges.This statistical framework allows for the integration of prior knowledge and observed data to make informed estimations and predictions.Furthermore, the Bayesian inference is based on the interpretation of probabilities as degrees of belief.Bayesian rule is used to combine existing information on the a priori known model and unseen data from the sample to be analyzed.This method allows to estimate and effectively manage the inherent uncertainty associated with the estimation process.

Table 1
Results of the Boolean search string for the respective repositories on uncertainty estimation methods using machine learning.Lin et al. [27] developed a framework based on Bayesian inference to estimate the risk factor of nonylphenosis (NPs) exposure in certain foods and environments.The proposed model facilitated the construction of a probabilistic risk estimation framework that considered a population of different age groups and both genders.Zouh et al. [28] developed a model to identify possible artifacts in a reconstructed image, based on the quantification of uncertainty through a Bayesian framework.This application based on Bayesian inference can be used to reconstruct medical images and to estimate the uncertainty associated with the reconstruction itself.Akkoyun et al. [29] demonstrated the effectiveness of Bayesian in estimating the maximum diameter of an abdominal aneurysm from CT images.This estimation enabled the assessment of the aneurysm's growth rate and facilitated the identification of appropriate treatment options.Magnusson et al. [30] proposed an estimation of the principal stratum to assess the effectiveness of the treatment on disability progression in patients with secondary progressive multiple sclerosis (SPSM).A Bayesian inference through Markov chain Monte Carlo (MCMC) methods using the No-U-Turn sampler (NUTS) was used to estimate the principal state.Lipkova et al. [31] presented a Bayesian machine-learning framework to calibrate the mathematical model of glioblastoma tumor growth from multimodal scans.Through a correct inference of tumor density, radiofrequency therapy can be better defined.The Bayesian framework effectively quantified uncertainties in imaging and modeling, allowing for the prediction of patient-specific tumor cell density with credible ranges.Flügge et al. [32] introduced a Bayesian network for the diagnosis of three different kinds of headaches.The study explored three types of inference methods to develop different Bayesian networks for the diagnosis of a brain tumor and three different forms of headache (migraine with/without aura, tension headache and cluster headache).Wang et al. [33] developed a model for assessing the risk factors associated with lung cancer, enabling the development of a medical expenditure model that accounts for data uncertainty through a Bayesian network.By accurately gauging the severity of cancer, the model predicted individual patients' medical expenses, aiding in effective health insurance management.

Related works based on Monte Carlo simulation
Another widely used method for dealing with uncertainty is Monte Carlo simulation.Monte Carlo simulations are a class of computational techniques that facilitate the prediction of all conceivable outcomes of a given system, thereby enabling the user to gauge the associated risks and uncertainties prior to making a decision.A distinctive feature of this approach is the use of algorithms that employ a random sampling procedure to tackle deterministic problems.
An example of applying the Monte Carlo simulation tool for uncertainty estimation is presented by Salgado et al. [34], who demonstrated a computer simulation model used to estimate the impact of sugar-sweetened beverages on diabetes and cardiovascular disease.Tsai et al. [35] presented a GPU-based microscopic Monte Carlo simulation tool for the DNA damage caused by ionizing radiations.Specifically,  Carlo simulation to prove the feasibility of the dual-head Compton camera with Si/CZT material as a medical imaging system for the detection of breast cancer.Shih et al. [37] employed the Monte Carlo method to calculate the dose distribution of the blood irradiator, assessing the viability of using MAGAT gel for dose measurements.Unlike traditional dosimeters that necessitate multi-point or plane measurements, the combination of Monte Carlo simulation and polymer gel allowed for the simultaneous acquisition of a 3D dose distribution.Gasparini et al. [38] proposed the Monte Carlo simulation for the evaluation of different analytic models proposed for informative visiting processes in healthcare longitudinal data.This study highlighted the potential for biased regression coefficient estimates within a longitudinal model when an informative visiting process was neglected.Furthermore, various methods proposed in the literature to address this issue were compared and evaluated, with an assessment of the differences in their performance.Lee et al. [39] demonstrated the feasibility of the Monte Carlo simulation to handle the uncertainty of the proton path during the proton therapy.To model the proton beam range monitoring process, they modeled a 3-D PG slit-camera system based on pixelated cadmium zinc telluride (CZT) semiconductor detectors, using TOPAS Monte Carlo simulation.

Related works based on fuzzy systems
Fuzzy logic is a powerful approach for handling uncertainty in machine learning models by accommodating imprecise and ambiguous information.It allows nuanced reasoning and decision-making by assigning membership degrees to different categories.The Adaptive Neuro-Fuzzy Inference System (ANFIS) combines fuzzy logic and neural networks, integrating fuzzy logic with neural networks to model uncertainty [23].ANFIS adapts its fuzzy inference system's structure and parameters based on input-output training data, enabling accurate inference in complex systems.ANFIS's layered architecture includes input, fuzzification, rule, normalization, and defuzzification stages.Widely used for system modeling, prediction, and control, ANFIS offers a flexible and effective solution for addressing uncertainty.
Castellazi et al. [40] presented several machine learning models combined with unimodal and multimodal MRI features to classify Alzheimer's disease (AD) and vascular dementia (VD).ANFIS proved to be the most effective classifier in distinguishing between AD and VD subjects, achieving the highest performance when using combined tensor imaging (DTI) and genetic testing (GT) features.ANFIS successfully predicted the prevalent underlying disease in 11 out of 15 MXD subjects, resulting in a correct prediction rate of 77.33%.Das et al. [41] proposed a hybrid model called Linguistic Neuro-Fuzzy with Feature Extraction (LNF-FE) to analyze medical data and predict eight different diseases, such as diabetes or breast cancer.The LNF-FE model was developed by incorporating multiple components: expanding input features through fuzzification, assigning linguistic values to these features, performing feature selection using PCA, and using an artificial neural network for prediction.The LNF-FE model exhibited superior performance and achieved better results in comparison to alternative approaches.Vidhya et al. [42] introduced the Modified adaptive neuro-fuzzy inference system (M-ANFIS) for the assessment of various disorders of healthcare.After a data pre-processing phase, a feature selection was performed, and the count of the closed frequent item set (CFI) was estimated.M-ANFIS showed better performance than the other traditional methods, such as SVM.Kaur et al. [43] devised a predictive model for various knee diseases, namely osteoarthritis (OA), rheumatoid arthritis (RA), and osteonecrosis (ON), using a combination of Neuro-Fuzzy and Artificial Neural Network (ANN) techniques.The study involved a comparative analysis of the performance between a fuzzy system and the Adaptive Neuro-Fuzzy Inference System (ANFIS).The results demonstrated the efficacy of the ANFIS approach in accurately predicting knee diseases, providing valuable insights for improved diagnosis and treatment strategies in clinical settings.Liu et al. [44] exploited fuzzy interference logic to develop a decision-making model for prostate cancer detection, analysis, and fusion of medical data and treatment recommendations with risk analysis.De Medeiros et al. [45] developed a fuzzy inference system for supporting medical decisions.The implementation of the Fuzzy Intelligent System demonstrated the potential to create innovative channels for the distribution of medical costs, allowing for accurate assessment of health risks for new patients.Furthermore, its contribution to the medical domain was complemented by increased sales and enhanced hospital marketing efforts, adding value to the overall system.Nguyen et al. [46] developed an integrated system for medical data classification.To be specific, the model consisted of a wavelet transform, for the features extraction, and a type-2 fuzzy logic system for the classification of breast cancers and heart diseases.

Related works based on Dempster-Shafer theory
The Dempster-Shafer theory is a generalization of Bayesian theory [24].Dempster-Shafer's theory attempts to overcome the limitations of the Bayesian theory, which is unable to describe ignorance and only considers single rows.The Dempster-Shafer theory also known as evidence theory or belief functions theory, provides a powerful mathematical framework for managing uncertainty and combining evidence from multiple sources in machine learning.Introducing belief functions that assign masses of belief to subsets of possibilities, enables a more expressive representation of uncertainty and the ability to handle conflicting evidence.Incorporating the Dempster-Shafer theory into machine learning models enhances their performance and enables more robust and reliable decision-making in uncertain environments.Buono et al. [47] developed a model based on the Dempster-Shafer theory to generate a diagnosis system for certain skin diseases.After collecting a series of symptoms based on medical knowledge, the authors proposed a set of rules to enable the diagnosis of skin diseases.The Dempster-Shafer method demonstrates its effectiveness in delivering reliable outcomes for skin disease consultation.The results generated by the expert system align with the predetermined rules, thereby confirming the advantage of this method in accurate disease diagnosis.Prameswari et al. [48] introduced the DST to diagnose digestive diseases.Web-based E-diagnostic based on DST provided diagnosis information that was based on symptoms and enables better management of the disease.The results of this study demonstrated that by applying the Dempster-Shafer method for diagnosing digestive disorders in humans, a higher confidence value (70%) was obtained compared to the value obtained (60%) with the Certainty Factor method.Razi et al. [49] addressed the challenge of multi-class motor imagery tasks using a model based on DST.Unlike the traditional common spatial patterns (CSP) method that enables binary classifications, this study focused on analyzing five classes of tasks.To tackle the multi-class problem, a DST-based model was employed, which fused the results of binary classification.Additionally, DST was introduced as a method to handle uncertainty arising from a lack of knowledge in this study.Another interesting application of DST has been proposed by Shi et al. [50], in the context of drug interactions, which can be a key factor in therapeutic decision-making.While descriptions of possible drug interactions exist for many medications, there was no description of the specific interaction analyzed in this study.Building upon this knowledge, the authors presented a model based on local classification (LCM) for predicting drug interactions for new medications.Kang et al. [51] proposed the use of DST in the analysis of the incidence of Clostridium difficile infection (CDI) in hospitals.The proposed model was based on the Gaussian mixture model (GMM) for the generation of the explicit probability criteria to assess the risk factors and the DST for predicting the incidence of infection based on the probability criteria provided by the GMM.A model based on the combination of ambiguity measurement with DST theory was proposed by Wang et al. [52], for uncertainty management in medical diagnostic decision-making.The ambiguity measure assessed the level of uncertainty for each parameter, enabling the creation of basic probability assignments (BPA) for each parameter.Furthermore, the DST of evidence was employed to aggregate independent evidence into collective evidence, facilitating the ranking of candidate alternatives and identifying the best alternative.Ghesu et al. [53] introduced a model for evaluating medical images, combining uncertainty measurement with probabilistic classification to quantify the system's confidence in its outputs.By employing uncertainty estimation through Dempster-Shafer theory, the model achieved a substantial improvement in accuracy and robustness across different kinds of images, including chest radiographs, abdominal ultrasound image view-classification, and brain metastases detection in brain MR scans.

Related works based on Dempster-Shafer theory and fuzzy logic
Some authors have presented models for uncertainty management based on both fuzzy logic and Dempster-Shafer's theory.For instance, Biswas et al. [54] proposed a model for the enhancement of chest X-ray images based on soft fuzzy sets and the DST approach.The proposed model involved two soft fuzzy sets of the image grey levels.The uncertainty levels of peak intensity and spatial information were handled by fuzzy intervals based on the DST approach.Porebski et al. [55] developed a set of rules for the diagnosis of liver fibrosis based on DST extended for fuzzy focal elements.Utilizing the DST to address knowledge uncertainty caused by incomplete and unbalanced data, the proposed model was successfully developed to support the diagnosis of liver fibrosis.Xiao et al. [56] developed a model to deal with the uncertainty that arises in decision-making.The model integrated belief entropy, fuzzy preference relations, and DST theory to measure and modulate parameter uncertainties while merging independent parameters.The model was validated in a clinical setting, considering four potential diseases: acute dental abscess, migraine, acute sinusitis, and peritonsillar abscess.The proposed method enabled the measurement of parameter uncertainty, and assessment of parameter reliability, and provided insights for clinicians regarding the impact of parameters on decision-making.Ghasemi et al. [57] presented a model for brain segmentation, that was based on the combination of fuzzy inference system and Dempster-Shafer theory (FDSIS).The DST was proposed to handle and reduce uncertainty in MRI segmentation.In the FDSIS algorithm, features were extracted from MRI images, including pixel intensity and spatial information.The fuzzy inference was utilized to construct rules, while the DST was employed for the aggregation phase of the fuzzy inference system.The FDSIS proposed model demonstrated an enhanced accuracy in segmenting both real and simulated MRI images when compared to traditional methods, which generally lack the incorporation of uncertainty estimation and management.Li et al. [58] combined the fuzzy soft set and the Dempster-Shafer theory of evidence for decision-making applied to solving medical diagnosis problems.They used grey relational analysis to calculate the degree of uncertainty of the various parameters, based on which the probability assignment function is obtained.Then, through the Dempster-Shafer rules, all alternatives were aggregated into a collective alternative, whereby they were ranked to obtain the best alternative.The authors demonstrated the superior performance of the model based on fuzzy soft set and Dempster-Shafer theory, surpassing even traditional methods like Feng's method and Naive Bayes' classifier.

Related works based on rough set theory
Rough set theory is a mathematical framework utilized to address uncertainty and inconsistency in data analysis and decision-making.It provides a set of tools for handling imperfect or incomplete information.In the context of uncertainty estimation in machine learning, rough set theory enables the exploration and representation of uncertainty through the definition of upper and lower approximations.It facilitates the identification of uncertain instances and supports attribute reduction, thereby contributing to effective uncertainty management in the learning process.
Acharya et al. [59] developed a combination of cuckoo search and rough set (CRCS) models for knowledge inference from the cardiac disease information system.The objective was to identify which hidden features and knowledge derived from electronic information systems allowed the diagnosis of early cardiac disorders.Clinical data of 603 patients were analyzed, and an initial feature selection using the cuckoo search model yielded eight selected features.These features were then analyzed using rough set data analysis to generate classification rules.The CRCS model exhibits the highest accuracy rate (93%) compared to the rough set model (92%) and the decision tree model (90%), demonstrating its effectiveness in knowledge inference for cardiac diagnosis.Santra et al. [60], showed the use of a lattice of raw knowledge as an information system for the rough set for the design of knowledge for medical expert systems.They applied the proposed model for a simple case study from the domain of low back pain.An innovative metric was proposed to assess the consistency and reliability of rules.The authors demonstrated that the utilization of a lattice of raw knowledge facilitated effective information management in medical systems, surpassing the capabilities of conventional tabular information systems.Bania et al. [61] developed an R-ensemble method for attribute selection by exploiting the rough set theory, demonstrating its superiority over methods already found in the literature.They used a medical dataset, collected from UCI Machine repositories [62], which contained clinical data on Wisconsin breast cancer, lung cancer, diabetes, Indian liver patients, dermatology chronic kidney, and hepatitis.Except for the diabetes dataset, all other datasets exhibited missing values, which were addressed using a k-nearest neighbor (kNN) imputation method.This study aimed to tackle one of the major challenges in the analysis of medical and healthcare data, specifically dealing with datasets that contain missing and redundant information, leading to uncertainty.Jiang et al. [63] showed a novel computational model based on fuzzy mathematics and rough set theory for the assisted diagnosis of sub-health referring to traditional Chinese medicine (TCM).They analyzed original medical records from the First Affiliated Hospital of the Guangzhou University of Chinese Medicine.Comparative analysis with linear models, Naive Bayesian classification, and fuzzy comprehensive evaluation revealed that the novel model achieved higher overall accuracies.

Related works based on imprecise probability
Imprecise probability is a generalization of traditional probability that can be used as a framework in machine learning for handling uncertainty.Unlike traditional probability theory, which assigns precise probabilities to events, imprecise probability allows for the representation of uncertain or ambiguous information by using intervals or sets of probabilities.This approach recognizes that in real-world scenarios, it is often challenging to assign precise probabilities due to limited knowledge or conflicting evidence.In machine learning, imprecise probability provides a flexible and robust framework for uncertainty estimation.It allows for the modeling of uncertain events or variables by considering a range of possible probabilities rather than a single value.This is particularly useful when dealing with incomplete or noisy data, where precise probabilities may be difficult to obtain.For instance, Giustinelli et al. [64] provided empirical evidence on the perception of dementia risk among elderly Americans without dementia and through models on the imprecision of subjective probabilities.Mckenna et al. [65], reported several mathematical models, that highlight the modeling to improve breast cancer treatments, especially chemotherapy, and radiation therapy.The authors demonstrated how mathematical models can provide valuable contributions within the context of breast cancer therapy.In their study, Mahmoud et al. [66], investigated various machine learning models to address uncertainty measures and imprecise probabilities in the diagnosis of medical noisy data.The models proposed in their research were categorized into three groups: single tree classifiers, ensemble models, and credal decision trees (CDTs).Notably, the credal decision trees outperformed the single tree classifiers, exhibiting higher accuracy, particularly in noisy domains and databases with mostly numerical attributes.

Uncertainty quantification in deep learning
Table 3 summarizes the distribution of works based on the employed method for uncertainty management in deep learning approaches.Data uncertainty and model uncertainty are both important concepts in data analysis and modeling.Given that various sources of uncertainty may arise in a model, developing effective methods for estimating uncertainty in their prediction is currently a subject of significant interest in the research community [4].While the data uncertainty is often reflected in the Softmax output of a classification model [67], researchers have extensively investigated four main approaches to disentangle and accurately represent model uncertainty from data uncertainty [67,68].The choice of approach depends on the number and characteristics of the deep neural network being employed [4]: (i) Single deterministic methods: These methods employ a deterministic neural network for uncertainty quantification, relying on a single forward pass to generate predictions without explicitly modeling uncertainty [69].(ii) Bayesian methods: These methods calculate a posterior distribution that captures the uncertainty in the parameter values of the model [70][71][72].This distribution is subsequently utilized to quantify the uncertainty in predictions or estimates.(iii) Ensemble methods: These methods leverage the fusion of multiple deterministic networks to enhance model performance and generalization [73].By combining the predictions of different networks, ensemble approaches enable the generation of more reliable and accurate results surpassing those achieved by individual models alone.(iv) Test-time augmentation methods: These methods involve generating multiple predictions from various augmentations of the primary input data during inference and quantifying uncertainty based on these predictions [74].
When discussing deep learning frameworks, an important aspect is the calibration of the predictor.A predictor is considered well-calibrated when its predictive confidence accurately estimates the actual probability of accuracy [75].Thus, it is important to ensure that the network is well-calibrated before employing uncertainty methods [4].There are three relevant calibration methods used in healthcare, which depend on the phase they are applied: regularisation [4,[76][77][78] post-processing [79][80][81], and neural network estimation methods [82,83].These methods adjust the output probabilities of the model to better match the true probabilities of the data, resulting in more accurate and reliable predictions.

Related works based on single deterministic methods
Signal deterministic methods for uncertainty quantification in deep learning involve deterministic approaches that analyze the characteristics of the model's signals to estimate uncertainty [4].These methods do not directly model uncertainty as probabilistic distributions but instead focus on analyzing the properties of the model's outputs.Uncertainty can be computed through external methods or internal methods.External methods leverage techniques such as using gradient matrices [84,85], employing additional networks for uncertainty estimation [86], or measuring training data density in the representation space for input data [87].Internal methods include training prior networks [68], evidential neural networks [88] and using gradient penalties [79].While they do not provide probabilistic uncertainty estimates, they offer insights into the model's reliability and confidence.These methods can be computationally efficient compared to full probabilistic approaches, making them practical for certain applications.
Ktena et al., [89] developed and trained a convolutional neural network on functional MRI images of the brain.Their objective was to assess the similarity between functional brain networks by measuring the similarity of irregular graphs.Through their proposed method, they achieved a significant improvement of 11.9% in overall classification accuracy.McKinley et al., [90] developed and trained a CNN using MRI images of patients with multiple sclerosis.The authors used best-practice standards to annotate lesions and predict the probability that the network assigns a different label instead of the ground truth.Their approach yielded accuracies of 75% and 85% in accurately distinguishing stable and progressive time points, showcasing the effectiveness of their method.Devries et al., [91] a convolutional neural network and explored six distinct uncertainty estimation techniques to assess uncertainty in the segmentation of skin lesion images.The authors observed that the heteroscedastic classifier neural network yielded the least improvement in results compared to the other uncertainty estimation techniques, which demonstrated comparable performance.Luo et al., [92] introduced a novel deep commensal model for estimating intrinsic uncertainties in cardiac magnetic resonance images.They computed the commensal correlation between direct area estimation and bi-ventricle segmentation, achieving accurate uncertainty estimation through one-time inference based on cross-task output variability.The authors highlight that their proposed method outperforms other approaches in terms of quantification accuracy and optimization results.Ghesu et al., [93] applied a bootstrapping uncertainty measure to their DenseNet model.By employing this recommended uncertainty measurement, the authors found that unwanted training of chest X-ray images could be eliminated, leading to increased robustness and accuracy of the model.Additionally, the method was effective in identifying reader errors.Graham et al. [94] developed and trained a 3-dimensional U-Net model using MRI images of the brain to precisely labeling different regions and sub-regions of the brain.To achieve this, the authors measured cross-entropy uncertainty at progressively smaller sub-regions of the brain.The results showed a dice score of approximately 0.85 for all regions in the uncertainty-aware model, indicating high accuracy in the segmentation task.
Liao et al. [95] developed a DenseNet model to tackle the issue of inter-observer variability in assessing the quality of cardiovascular images obtained through echocardiography.They measured the aleatoric uncertainty by incorporating the variability observed among different experts.The proposed method treated this variability as aleatoric uncertainty and represented it through Laplace or Gaussian distributions in the regression space.The authors observed that their approach resulted in reduced absolute error compared to conventional regression models, as indicated by their findings.Li et al. [96] applied the DistDeepSHAP uncertainty measure to assess the importance of features in autism brain images by employing a SHAP-based deep model.The results indicate that this approach has the potential to identify biomarkers associated with the disease in neuroimaging data.Ye et al., [97] utilized the neurite orientation dispersion and density imaging model and explored the Lasso bootstrap approach for uncertainty estimation of tissue microstructure in brain diffusion magnetic resonance images.The authors observed a meaningful relationship between the proposed uncertainty measures and estimation errors, resulting in the generation of The study demonstrated that the proposed method enhances model calibration, enabling better capture of uncertainty in both samples and labels.

Related works based on Bayesian methods
Bayesian methods involve using different types of stochastic deep neural networks wherein two forward passes of the same data sample generate varying results [4].In the Bayesian models, the parameters are treated as random variables.During a forward pass, the parameters are sampled from the distribution of data, resulting in stochastic prediction outcomes, where each prediction is based on varying model weights.Bayesian neural networks assume a prior distribution p(θ) and demonstrate the posterior distribution over the parameter space given by p(θ|x, y) for the training input pair (x,y).After the estimation of posterior weights, the prediction of an output y* for the input data x* may be obtained by performing the Bayesian Mode Averaging or Full Bayesian Analysis [10].Some types of Bayesian methods include Monte Carlo dropout [101], variational inference [102], sampling [103] and Laplace approximation [104].The choice of method depends on the specific application and the nature of the data being analyzed.
Leibig et al., [105] employed Bayesian uncertainty measures in combination with various data and deep models to classify fundus images for diabetic retinopathy.The conducted experiments revealed a robust model generalization.Notably, Monte Carlo drop-out outperformed other direct methods, demonstrating its ability to accurately determine and quantify uncertainty.Ozdemir et al., [106] introduced a novel approach where uncertainty measures, specifically predictive mean and standard deviation, were fused with the original image using the Bayesian U-net model.This fusion resulted in the creation of a composite image, which was subsequentially fed into the Bayesian neural network.The authors concluded that incorporating uncertainty measures into the workflow significantly improved prediction accuracy and model confidence.Jungo et al. [107] devised four residual convolutional neural network models with Monte Carlo dropout at full resolution, alongside one model incorporating the conventional weight scaling dropout technique.The position and rate of Monte Carlo dropout were varied for each model, and the performance of these five models was compared to evaluate their effectiveness in uncertainty quantification for brain tumor image segmentation.The authors concluded that informative uncertainty is obtainable by applying the Monte Carlo dropout after each convolutional layer.In a subsequent study [108], the authors developed the U-net model and employed uncertainty techniques such as weighted mean entropy and mean entropy among experts for brain tumor image segmentation.The findings demonstrated that the uncertainty of the model's parameters can be determined by fusing the learned observers' uncertainty with a Monte Carlo-based Bayesian network.
Orlando et al., [109] developed a Bayesian neural network with integrated Monte Carlo dropout to provide epistemic uncertainty feedback.Results showed that the proposed uncertainty estimation inversely corresponds to the model's performance.This highlights its potential use in identifying areas that require corrections in image segmentation.Heo et al., [110] introduced a unique variational attention model that incorporates instance-dependent modeling to capture both data and model uncertainties.The model was validated on six real risk prediction tasks in the healthcare domain, involving physiological signals and images.
The authors reported significant improvements achieved by the developed model compared to existing attention models.Adrian et al., [111] integrated the Monte Carlo dropout method with their developed CNN model to estimate uncertainty in multiple sclerosis images.The authors concluded that this technique proves valuable in identifying scans that may require additional examination, as the variance of Monte Carlo dropout samples corresponds to model errors.Roy et al. [112] utilized the Bayesian QuickNAT model and integrated four metrics to assess segmentation uncertainty.The authors highlight that the proposed uncertainty metrics hold promising potential for evaluating the accuracy of segmentation methods in deep models.Herzog et al. [113] combined Bayesian uncertainty techniques and advanced aggregation methods with their Bayesian neural network to achieve highly accurate stroke classification.The authors observed that the integration of Bayesian-based uncertainty methods not only enhanced stroke prediction but also improved the estimation of uncertainty in incorrect patient classification and the detection of uncertain aggregations.
Baumgartner et al. [114] introduced the variational autoencoder model and applied the probabilistic hierarchical segmentation technique on thoracic and prostate images.The results demonstrated that the proposed technique yielded more naturalistic and diverse segmentation of images compared to other related approaches.In a separate study, Raczkowski et al. [115] employed the variational-based dropout measure for uncertainty estimation using the Bayesian neural network in the segmentation of colorectal cancer images.The authors found that the proposed uncertainty measure enhanced the speed of the deep model by approximately 45%, thereby offering a significant computational advantage.Eaton-Rosen et al. [116] conducted a study examining the application of Monte Carlo dropout and M-heads uncertainty measures in the U-net model for calculating predictive intervals during counting tasks in medical imaging.The results indicate that these uncertainty measures are effective in accurately counting histopathological cells and identifying white matter hyperintensity images.Di Scandalea et al. [117] developed a U-net model trained with dice loss and weighted binary cross entropy for segmenting myelin sheath in mice images.They utilized Monte Carlo dropout to estimate uncertainty.The authors highlight that by examining the generated heatmaps from uncertainty estimates, users can identify potential model failures and control uncertainty for more accurate predictions in biomedical applications.Jena et al. [118] employed a Bayesian neural network with a Monte-Carlo uncertainty measure for segmenting brain, cell, and chest radiograph images.The authors concluded that their proposed method improves segmentation quality and calibration, providing more accurate uncertainty estimates compared to existing techniques.
Soberanis-Mukul et al. [119] used a graphical convolutional neural network with Monte Carlo dropout and dice scores as uncertainty measures for segmenting pancreas and spleen images.The authors found that their approach enhances dice scores for both images compared to the original model predictions.Hu et al. [120] utilized the probabilistic U-net model to investigate uncertainty estimation in lung nodule and prostate MRI images.They specifically explored the application of variational dropout.The authors concluded that their approach led to improved predictive uncertainty estimates, enhanced sample accuracy, and increased diversity.Combalia et al., [121] employed a convolutional neural network for the classification of skin lesion images and applied the Monte Carlo dropout uncertainty estimation method.To quantify predictive uncertainty, the authors employed metrics such as entropy, variance, and Bhattacharyya coefficient between distributions.The results indicate the successful utilization of uncertainty metrics in detecting challenging and out-of-distribution samples.Toledo-Cortes et al. [122] developed a hybrid deep learning Gaussian process model for the classification of diabetic retinopathy.In addition to predicting the mean value, the authors also computed the standard deviation as a measure of prediction uncertainty.They found that the proposed model outperformed the original deep learning model and enabled uncertainty analysis.Laves et al. [123] estimated predictive uncertainty using variational Bayesian inference with Monte-Carlo dropout for regression tasks on medical image datasets.Their findings highlighted that well-calibrated uncertainty in regression tasks enables the elimination of unreliable predictions and the identification of out-of-distribution samples.
In another study by Hu et al. [124], a CNN was trained on PET and CT images for the diagnosis of rare lymphoma.The authors incorporated zone-based uncertainty estimates based on the Monte Carlo dropout technique.The reported sensitivity of the model was approximately 75%, indicating its effectiveness in detecting the target condition.Nair et al. [125] developed a CNN for detecting multiple sclerosis lesions using MRI images from patients with worsening remitting multiple sclerosis.They employed Monte Carlo dropout to approximate probability distributions and subsequently measured variance, predictive entropy, and mutual information.The proposed method achieved a true positive rate of 0.8 and a false detection rate of 0.2, demonstrating its potential for accurate lesion detection.Kwon et al. [126] utilized a Bayesian neural network with predictive uncertainty, which allowed for the decomposition of uncertainty into aleatoric and epistemic components.The authors applied this technique to segment ischemic stroke and retinal images and concluded that it provided a deeper understanding of point predictions.Selvan et al. [127] developed a unique conditional variational autoencoder called conditional Normalizing Flow (cFlow) to improve the approximation of latent posterior distributions.The performance of their model was evaluated on two medical imaging datasets, demonstrating substantial improvements in both qualitative and quantitative measures compared to state-of-the-art methods.In a study by Seebock et al. [128], the Bayesian U-net model with Monte Carlo dropout was employed to estimate model uncertainty in retinal image segmentation.The authors found that the proposed technique achieved high accuracy in segmenting both healthy and diseased retinal images.Hiasa et al. [129] employed the Bayesian U-net CNN model along with Monte Carlo dropout and dice scoring for uncertainty estimation in muscle CT image segmentation.They discovered a relationship between high uncertainty pixels and segmentation failure, enabling patient-specific analysis of muscles.Xia et al. [130] implemented a Bayesian model with uncertainty-aware multi-view training on pancreas and liver tumor images.The authors concluded that applying multi-view co-training on 2D models yielded promising results.Marc et al. [131] investigated the integration of reversible blocks into the PHiSeg architecture for image segmentation.The authors reported that the recommended method required less memory compared to not using reversible blocks while maintaining comparable segmentation accuracy.
Wickstrom et al. [132] used a CNN with a Monte Carlo dropout backpropagation algorithm to determine the uncertainty in input feature importance.The authors demonstrated that their proposed method effectively models uncertainty in input feature importance, showing significant contrasts between correct and incorrect predictions.Carneiro et al. [133] used a DenseNet model to investigate uncertainty estimation and confidence calibration in the classification of colorectal polyps.They explored both Bayesian and non-Bayesian inference methods, using entropy as an uncertainty measure.The study demonstrated that employing Bayesian methods to determine classification entropy or variance resulted in an accuracy of approximately 76%.Li et al. [134] developed a CNN model and compared three Monte-Carlo dropout methods, evaluating metrics such as negative log-likelihood, and expected calibration error.The authors found that the proposed method of region acquisition, as opposed to full region acquisition, led to better calibration of the model regardless of the uncertainty measure used.Quan et al. [135] [142] utilized a probabilistic U-Net model to perform density modeling on thoracic computed tomography and endoscopic polyp images.They employed a probabilistic segmentation model to learn aleatoric uncertainty as a distribution of possible annotations.The authors concluded that this approach improved predictive performance by up to 14% in modeling uncertainty.Teng et al. [143] employed a deep generative model with recurrent neural networks and trained it using clinical, imaging, genetic, and biochemical markers to investigate the progression of Alzheimer's and Parkinson's disease.The model, incorporating internal stochastic components, achieved good accuracy of 98.1% and 79.7% for Alzheimer's and Parkinson's disease, respectively.Wang et al. [144] applied a multi-instance learning approach for the classification of diabetic macular edema using optical coherence tomography images.They quantified uncertainty by measuring the mean and standard deviation of probabilistic predictions, resulting in an accuracy of approximately 95%.Zhang et al. [145] explored deep neural networks, random forest classifiers, and the light gradient boosting model for toxicity prediction in chemical compounds.They employed conformal prediction with user-defined significance levels to quantify prediction uncertainty, obtaining an average AUC of 0.734.Vranken et al. [146]

Related works based on ensemble methods
Ensemble methods involve the combination of many different deterministic networks during model inference.Hence, the prediction from an ensemble model is based on diverse predictions obtained from the different networks.Using combined effects among different networks, researchers have found that a group of networks tends to make better decisions than a single network, leading to improved model generalization [4].Ensemble models may be trained using weight sharing [174], reducing numbers [6], and other various strategies like data shuffling or boosting [175].
In a recent study by Jungo et al., [176], subject-wise uncertainty measures were compared against five other uncertainty measures, including ensemble models, for brain and skin lesion image segmentation.The authors discovered that while existing uncertainty measures demonstrate good calibration at the data level, they are not well-calibrated at the subject level.Hence, subject-wise uncertainty estimates are crucial measures for accurate segmentation.McClure et al., [177] proposed the MeshNet architecture combined with the distributed weight consolidation technique to train independent structural MRI datasets.The findings revealed that the distributed weight consolidation measure improved the performance of each independent test while maintaining model generalization, surpassing the standard ensemble model.Wu et al., [178] introduced the deep Dirichlet mixture model to generate point estimates and credible intervals from learned distributions for evaluating uncertainties in Alzheimer's disease classification probability.The authors discussed the usefulness of the proposed model in predicting uncertainties for multiclass classification problems.
Linmans et al., [179] conducted a comparative analysis between the performances of the Multi-head convolutional neural network model combined with meta-loss functions, and those of the Monte Carlo dropout and deep ensemble methods, for estimating predictive uncertainty on out-of-distribution lymph node tissue images.The authors concluded, based on the results, that the multi-head convolutional neural network outperformed both Monte Carlo dropout and deep ensembles.Liang et al. [180] developed and trained four different types of CNN models using diverse datasets consisting of head CT, mammography, chest x-ray, and histological images.Instead of using the cross-entropy loss function, they introduced an auxiliary loss term that captures the difference between predicted confidence and accuracy for classification tasks, aiming to quantify model calibration error.The authors discussed that their proposed approach significantly reduces calibration error across various models and datasets.Hoebal et al. [181] compared the performance of traditional U-Net, U-Net with Monte Carlo dropout, and Deep Ensemble in segmenting nodules in CT images.The Deep Ensemble method showed slightly better results compared to the Monte Carlo dropout.The authors concluded that incorporating uncertainty information provides a means to assess segmentation quality automatically, even without access to ground truth.Mehrtash et al., [182] employed a fully convolutional neural network (FCN) along with model ensembling to calibrate model confidence.They conducted a comparison of results using both dice and cross-entropy losses.The authors found that employing model ensembling successfully calibrated the confidence of fully convolutional neural networks trained with the dice loss function.Dahal et al. [183] investigated three uncertainty measures and utilized four metrics on the ResNet model for cardiac ultrasound image segmentation.The results demonstrated that uncertainty estimation effectively identified and rejected low-quality images, leading to enhanced segmentation outcomes.The study employed three ensembling-based uncertainty models quantified using four different metrics.Chiou et al. [184] utilized an encoder-decoder network combined with a CycleGAN-based approach for uncertainty estimation in prostate image segmentation.The findings established that the proposed method improved image representations in prostate image segmentation, particularly for cancer characterization.Cao et al., [185] developed a temporal ensembling segmentation model to segment and classify masses in breast ultrasound.An uncertainty-aware unsupervised loss was also integrated into their model.Thanks to this approach, the authors obtained a pixel-wise accuracy of about 99%.Qin et al., [186] employed a CNN to estimate brain and cerebrospinal fluid intracellular volume.The authors trained an ensemble of deep models and calculated the variance in the combined results.The findings demonstrated significant relationships between estimation uncertainty and error across all measurements.Singh et al. [187] developed the Bayesian Multi-ResUNet model for the segmentation and classification of skin lesion images.The authors thoroughly investigated the effectiveness of two techniques: Monte-Carlo dropout and test time augmentation.Their findings revealed that the recommended approach not only showcases the robustness of the model but also enhances its transparency and confidence.Guo et al. [188] introduced a globally optimal label fusion algorithm and an uncertainty-guided, coupled continuous kernel cut algorithm for deep learning with shape priors.These were integrated into a deep learning ensemble algorithm designed for left ventricle segmentation and functional measurements in short-axis cardiac cine MRI.Remarkably, their model exhibited outstanding performance even when trained on small datasets (5-10 subjects) and with sparse annotations.Buddenkotte et al. [189] introduced an efficient model to calibrate deep learning ensembles for accurate classification probability approximation in medical image segmentation of ovarian or kidney tumors.The approach was successfully validated with complex segmentation tasks using large 3D networks, showing that the generated heatmaps outperformed traditional methods in approximating classification probability.

Related works based on test-time augmentation methods
Test-time data augmentation methods involve predicting and quantifying uncertainty at inference based on multiple predictions generated from various augmentations of the primary input data.Typically, multiple test data are created from each input data by applying data augmentation methods; then, the entire set of test data is used to calculate the predictive distribution for the estimation of uncertainty [4].Greedy policy search [190] is an example of an augmentation policy where each stage of the search selects a sub-policy that provides the most significant improvement in the ensemble predictions, which is added to the existing policy.These methods can improve model robustness and generalization by generating a diverse set of augmented data for testing and prediction.
In their study, Wang et al., [191] introduced a unique approach by using a CNN model with a bounding box for the segmentation of fetal and brain tumor images.They further explored scribble-based segmentation and image-specific fine-tuning during testing.The authors concluded that the proposed fine-tuning technique significantly improved segmentation accuracy while reducing user time and interactions required for the process.Ayhan et al., [192] developed a CNN model and incorporated a conventional geometric and color transformation technique as an uncertainty measure during testing on fundus images.The aim was to analyze the variations in the network's output.Based on their findings, the authors reported that their test-time augmentation approach provides valuable approximations for predicting uncertainties in deep models.Wang et al. [193] investigated aleatoric and epistemic uncertainties by incorporating test-time augmentation and test-time dropout methods into their CNN model.The authors' analysis revealed that the aleatoric uncertainty estimation technique yielded superior advantages compared to the test-time dropout technique.Specifically, it effectively mitigated the issue of overconfident predictions, resulting in more accurate and reliable uncertainty estimates.Zhang et al. [194] introduced a novel method for MRI reconstruction, where measurements are dynamically selected, and the prediction is iteratively refined during inference to achieve optimal reconstruction.The authors found that this technique effectively reduces reconstruction uncertainty in the resulting images.Athanasiadis et al., [195] developed a generative adversarial network to explore the relationship between visual and audio emotional expressions.They employed conformal prediction to obtain calibration error and confidence values during testing.The authors reported an approximately 2% increase in classification accuracy on two public datasets.In a separate study, Ayhan et al. [196] developed and trained a convolutional neural network using fundus images for diagnosing diabetic retinopathy.They calculated the variance using entropy as a measure of the distribution of predicted probabilities.The reported accuracy ranged from a commendable 96%-98%, highlighting the effectiveness of their approach in achieving accurate diagnoses.Araujo et al., [197] used convolutional batch normalization blocks and max-pooling layers to assess the severity of diabetic retinopathy in retinal images.The authors employed Cohen's kappa statistics to estimate the model's predictions at different uncertainty threshold levels by calculating the variance in image-wise retinopathy grade probabilities.They concluded that the best results were achieved using the quadratic-weighted Cohen's kappa, ranging from 0.71 to 0.84.Abdar et al. [198] developed a hybrid deep model for skin cancer image classification.They explored three uncertainty metrics, including Monte Carlo dropout and Deep Ensemble.The results demonstrated that the proposed model achieved the highest accuracy of approximately 91% and showed potential for effective use in various stages of medical image analysis.Scalia et al. [199] employed graph convolutional neural networks for predicting molecular properties.The authors quantified prediction uncertainty using Monte Carlo dropout, deep ensembles, and bootstrapping methods on four datasets.The deep ensemble consistently outperformed the other techniques.Dong et al. [200] introduced a novel deep neural network model called RCoNetks, designed for COVID-19 detection in chest X-ray (CXR) images.The model generates both the final diagnosis and uncertainty estimation, and it has been tested on both the original dataset as well as a corrupted dataset containing varying percentages of fake samples.In the presence of noise within the data, the proposed method displayed superior effectiveness compared to existing approaches.Cortes-Cirianco et al. [201] utilized ensembles of several deep learning models to examine the effectiveness of a substance in inhibiting a biochemical or biological function.They monitored the model parameters during single network optimization and calculated the variability and validation residuals across snapshots to quantify prediction uncertainty.The findings revealed a strong relationship between confidence levels and the percentage of confidence intervals, indicating accurate bioactivity estimation.In a separate study, Cortes-Cirianco et al. [202] utilized deep neural networks and the random forest classifier to explore the effectiveness of a substance in inhibiting a biochemical or biological function.They employed conformal prediction, along with test-time dropout, to compute prediction errors on various prediction combinations.The authors concluded that there existed a robust correlation between confidence levels and error rates in their analysis.KarAzmoudeh et al. [203] introduced Bayesian approximation and ensemble learning techniques as uncertainty quantification methods for classifying breast tumor tissue.They demonstrated that by employing evaluation criteria based on uncertainty estimation, it is possible to determine when to trust the output of a deep neural network.Furthermore, the Bayesian Ensemble model displayed greater reliability in quantifying uncertainty.Graham et al. [204] introduced a model for segmenting colon histology images, utilizing uncertainty quantification during test time by applying random image transformations.They also presented an uncertainty-based score to assess prediction reliability.The model exhibited excellent performance in segmenting both gland lumen and gland object across datasets from different centers.

Discussion
This paper discusses the importance of incorporating uncertainty estimation techniques in healthcare applications of machine and deep learning models.While Explainable AI (XAI) is a growing area of research, relying solely on XAI techniques cannot guarantee the reliability of model decisions.To promote safe decision-making in the medical domain, it is crucial to present uncertainty estimates in AI systems.Fig. 4 shows the main advantages of using UQ in AI models.

Uncertainty in machine learning frameworks
In machine learning, several approaches are being investigated to address decision-making under uncertainty, including Bayesian networks, Fuzzy logic, Monte Carlo simulation, and Dempster-Shafer theory [205].Bayesian networks rely on the concept of conditional independence to compute values of the joint distribution based on random variables in a specific domain.On the other hand, Dempster--Shafer's theory quantitatively evaluates uncertainties through subjective assessments of statement reliability by experts.Fuzzy logic assigns values to elements using membership functions that represent their degree of belongingness to a fuzzy set, with subjective probability distributions assigned to these fuzzy sets [205].When it comes to medical decisions, Bayesian networks, and fuzzy logic are preferred due to their ability to represent medical knowledge in a structured manner and efficiently utilize prior probabilities for problem-solving [205].These concepts are summarized in Table 2, which highlights Bayesian models and Dempster-Shafer theory as the primary methods for uncertainty estimation in healthcare using machine learning techniques.This suggests that these approaches have proven effective in handling uncertainty in medical data and improving prediction accuracy.Additionally, uncertainty techniques utilizing machine learning models have primarily been applied to neurological systems, followed by thoracic systems (as cardiac systems), medical data, and other organs (with breast cancer detection being the most extensively studied) (Fig. 5).Uncertainty plays a significant role in machine learning, particularly in the analysis of clinical data (62%) and biomedical images (24%).Flügge et al. [32] investigated diagnostic inference when faced with uncertainty using Bayesian networks.Their model was tested on real-world medical history data, and the information derived from Bayesian networks can be applied beyond the mere determination of diagnostic probabilities for a given medical history.On the other hand, Lipkova et al. [31] demonstrated a Bayesian machine learning framework that utilizes high-resolution MRI scans and highly specific FET-PET metabolic maps to design personalized radiotherapy plans and estimate tumor cell density in patients with glioblastoma.This approach offers a promising avenue for individualized treatment planning and could lead to improved clinical outcomes.Razi et al. [49] is the only study that investigates uncertainty in signal processing models, demonstrating a new method for classifying motor imagery tasks based on Dempster-Shafer's theory.
Integrating uncertainty measurement into machine learning frameworks offers multiple benefits.It improves decision-making by providing insights into prediction confidence.Uncertainty estimation enhances model robustness, detecting out-of-distribution inputs.It facilitates interpretability, building trust and allowing experts to validate model decisions.Additionally, uncertainty-aware frameworks support efficient data acquisition strategies.Overall, it empowers users with reliable predictions, enhances model robustness, promotes interpretability, and supports efficient data collection.

Uncertainty in deep learning frameworks
Fig. 6 shows the different types of images studied for uncertainty techniques used in healthcare based on deep learning frameworks.The analysis reveals that brain, eye, and skin images have been the most extensively studied in the past decade, followed by chest, cardiac, and breast images.However, limited instances of research exist for liver, spleen, gastrointestinal tract, muscle, audio-visual, and cell membrane images, possibly due to challenges related to biological variability, imaging modality, and expert annotation [206].This variability may explain why brain, eye, and skin images are more commonly associated with uncertainty techniques compared to other medical images.Non-imaging data, such as physiological signals and the bioactivity of proteins, have received limited attention in the literature.Only a few studies, such as Heo et al. [110] examining data and model uncertainties using multiple physiological signals, and Jahmunah et al. [156] 4. The main advantages of using UQ in AI models.Fig. 5. Types of diseases most prevalently studied involving uncertainty techniques using machine learning models in healthcare (Table 2).The medical dataset represents works that utilize different combined datasets or non-specific datasets, such as EHR.Fig. 6.Bar graph representing the different types of images used in Table 3.

S. Seoni et al.
investigating model uncertainty using ECG signals, are mentioned in this review paper.These findings indicate that while uncertainty techniques have been extensively explored in the context of medical images, their application to non-imaging data is still emerging in the healthcare domain.Indeed, only a few studies have focused on studying uncertainty applied to physiological signals, such as in the case of ECG signals [156].Fig. 7 presents a pie chart illustrating the use of deep learning models with uncertainty techniques in healthcare.The analysis reveals that approximately half of the studies incorporated uncertainty techniques into convolutional neural networks (CNN), followed by Bayesian-based deep learning models.Deep CNN models are effective in learning useful representations of images and structured data [207] while Bayesian neural networks are effective in describing model uncertainties while requiring low memory consumption [4].In contrast, models such as autoencoders, and ensemble models were used to a lesser extent.Ensemble methods do not effectively describe model uncertainties, require training many networks, and incur high computational effort and memory consumption [4].Autoencoders employ the axis-aligned Gaussian as the latent distribution, which may be disadvantageous when estimating complex latent posterior distribution [127] in uncertainty techniques.Consequently, CNN and Bayesian deep models are more prominently utilized with uncertainty techniques in image-related applications, due to their inherent strengths and advantages over autoencoders and ensembles, as discussed.

Key papers and techniques in the field of uncertainty quantification
In the realm of machine learning, all methods for uncertainty quantification are equally represented (as shown in Table 2).However, in the domain of deep learning, Bayesian methods are the most widely used, with 60% of the papers (n = 68) identified in our review employing Bayesian methods (as shown in Table 3).Fig. 8 illustrates the various types of data employed alongside Bayesian methods, the predominant approach for UQ.It emphasizes that thoracic system data is the most prevalent, followed by nervous system data.Among these Bayesian methods, the MC dropout technique is the most popular, for several reasons.Firstly, the implementation of MC dropout is relatively straightforward, and unlike other techniques for quantifying uncertainty, it only requires enabling dropout layers during test-time to obtain uncertainty maps.This makes it a convenient and accessible method for many researchers and practitioners.Secondly, MC dropout is highly flexible and can be implemented in most deep neural networks simply by adding dropout layers within the architecture.This means that it can be used with a wide range of models and architectures, making it a versatile tool for uncertainty quantification.Finally, once uncertainty maps are obtained using MC dropout, they can be used to customize the pipeline by imposing fixed or adaptive thresholds based on the level of uncertainty in the prediction.This allows for a range of applications, such as refining semantic segmentation, correcting misclassified data, and improving model calibration.
Enabling the dropout layers during inference allows the model to make slightly different predictions for the same input each time.The variance between these predictions can be leveraged both in classification tasks (e.g., image and signal classification) to improve model accuracy and in segmentation tasks to generate uncertainty maps.By sampling multiple predictions during inference, the model can capture some of the uncertainty in its outputs.This Monte Carlo sampling approach has two main benefits: -For classification tasks, averaging the predictions of multiple dropout samples can improve model accuracy compared to a single prediction.-For segmentation tasks, the variance in the segmentation masks generated from different dropout samples provides a natural measure of uncertainty for each pixel or region.This produces an uncertainty map that highlights areas where the model is less confident.
Here we will discuss the key papers that have used MC dropout to improve their AI-based frameworks.Gal and Ghahramani [101] first proposed using MC dropout during inference to approximate Bayesian prediction intervals for neural networks.They showed that averaging the predictions from multiple dropout samples leads to improved classification accuracy and calibration of uncertainty estimates.
Jahmunah et al. [156] quantified uncertainty in an ECG model using MC dropout.This study develops a DenseNet model for myocardial infarction diagnosis from ECG signals that can quantify predictive uncertainty.Predictive entropy is computed based on the model's predictive probabilities and used as an uncertainty measure to detect misclassifications caused by out-of-distribution data.The results show that i) the model's uncertainty sensitivity increases as noise decreases, indicating increased confidence in predictions; ii) the model achieves high uncertainty accuracy and precision when SNR values are high, indicating it is aware of what it knows.Overall, MC dropout likely enables the model to estimate its predictive uncertainty, which allows it to detect misclassifications and indicate a lack of confidence when appropriate.This uncertainty awareness improves the model's reliability.Combalia et al., [121] used MC dropout and test time  Overall, the MC dropout technique provides a practical and flexible approach to uncertainty quantification in deep learning, which is why it is the most used Bayesian method in this field.

Application areas
Fig. 9 illustrates the most employed methods in machine learning and deep learning for analyzing different anatomical regions.For models analyzing the thoracic and nervous systems, Bayesian inference emerges as the most utilized technique for uncertainty management.In the case of the digestive system, Dempster-Shafer theory (DST) is the predominant technique, while Monte Carlo simulation is found to be most used for analyzing diverse dataset collections.Regarding deep learning, the Bayesian method remains the primary technique across all analyzed organ systems.
Fig. 10 presents a sunburst diagram highlighting the prevalent deep models for the four most extensively studied image types in Fig. 6.Our analysis reveals a notable trend in the utilization of uncertainty techniques, with CNN models being the most employed for studying brain images, followed by Bayesian-based deep models.A similar pattern emerges for eye images, where CNN models are predominantly favored, closely followed by Bayesian-based deep models.In the case of skin images, CNN models hold a widespread preference, while Bayesian-deep models are the preferred choice for analyzing chest images.
Integrating uncertainty measurement into deep learning frameworks in healthcare can provide several benefits.It can help improve the reliability and interpretability of the model's predictions, enable better decision-making by clinicians, and enhance patient safety by highlighting areas of uncertainty in the model's output.quantification in healthcare starting from 2021, which may be attributed to the rising adoption of uncertainty visualization techniques as reported in recent literature [208].
Based on the information presented in Table 2, it is evident that uncertainty techniques are commonly combined with machine learning approaches to capture and represent uncertainty in either data, models, or both.As a result, many studies focus on providing qualitative results, with only a few exploring methods for quantifying uncertainty.However, when examining deep learning approaches combined with uncertainty techniques, the emphasis is primarily on quantifying the inherent uncertainty in the data or the model, offering practical solutions for managing uncertainty in real-world medical systems.It is important to note that most studies investigate model uncertainties, followed by uncertainties in both the model and the data.The relatively fewer investigations into data uncertainties may be attributed to the fact that model uncertainties can be mitigated by improving the model architecture, learning process, and quality of training data, whereas data uncertainties are inherent and cannot be reduced [4].As a result, researchers often prioritize refining their models to reduce uncertainties rather than exploring approaches to enhance training performance on noisy data.Finally, it should be emphasized that many authors who study model uncertainty employ the Monte Carlo dropout technique, which is computationally complex [209] and may pose limitations in healthcare settings where timely and rapid diagnoses are crucial.
This review study has some benefits and shortcomings, as discussed below:

Advantages
(i) This review summarizes recent research on uncertainty techniques in the machine and deep learning models in healthcare.(ii) The type of diseases that have been studied using machine learning with uncertainty techniques have also been discussed.(iii) The frequency of machine learning methods used with uncertainty techniques in the past decade has been examined.(iv) The most used medical images for uncertainty techniques involving deep learning models in the past decade have been identified.(v) The most used deep learning with uncertainty techniques for the top four studied images in the past decade have been identified.

Limitation(s)
(i) Uncertainty techniques used in healthcare involving animal or plant data were not considered in this review.

Future work
Based on the findings of the review, it is evident that further research is needed to explore uncertainty techniques in deep models for healthcare applications, particularly in relation to physiological signals.Estimating uncertainty is crucial for quantifying and effectively managing the inherent noise, interference, and imperfections present in 1D physiological data.This, in turn, improves the quality of measurements, resulting in more accurate and reliable outcomes.Additionally, uncertainty quantification has the potential to enhance the reliability of model predictions, even in scenarios involving missing or noisy data.
The existing studies have mainly focused on datasets of one type or very few multimodal data.Therefore, future investigations should delve into uncertainty techniques for multimodal data.In multimodal data involving diverse sources like images, text, and physiological signals, uncertainty can arise from various factors, including sensor quality, measurement accuracy, and inherent variability across modalities.By employing uncertainty techniques, confidence levels can be quantified for outcomes derived from each distinct modality and the integrated multimodal dataset as a whole.This proactive approach contributes to the refinement of predictions, ensuring heightened precision and robustness in the results obtained.
Most of the current studies primarily concentrate on binary classification or segmentation problems.For this reason, it is recommended that future works incorporate an assessment of quantitative uncertainties in classification probabilities for multiclass data.Expanding the evaluation to include multiclass scenarios would offer a more comprehensive understanding of uncertainty in classification tasks.These avenues of research have the potential to enhance the accuracy and reliability of uncertainty techniques in healthcare applications.
To compare various uncertainty quantification methods and determine which one performs best on a given task, it is necessary to test them all on the same dataset.However, this review highlights a significant heterogeneity in both the tasks and datasets used across different studies.This variation can make it challenging to compare and draw general conclusions from the results.Furthermore, the use of different evaluation metrics and protocols across studies can further complicate comparisons.For example, some studies may report only accuracy, while others may report additional metrics such as precision, recall, or F1 score.Additionally, the choice of the dataset used for evaluation can significantly impact the results, as some datasets may be more challenging or have different characteristics than others.
To address these issues, future studies may benefit from using benchmark datasets and evaluation metrics to allow for more direct comparisons between different uncertainty quantification methods.Additionally, the development of challenges may help establish a standardized framework for evaluating the performance of these methods.
Future works could also investigate the following topics: I. Development and exploration of UQ methods in AI models, especially in ML models where fewer studies exist: The field of machine learning offers several areas that warrant further exploration and research, including the development and exploration of UQ methods.Despite notable advancements in UQ for machine learning, there is still a need for more methods to be proposed and explored, especially in healthcare [210].II.Fusion-based methods for enhancing AI techniques: Fusion-based methods, which combine multiple sources of information, offer a promising avenue for improving both predictions and uncertainty estimation in machine learning.Investigating and exploring fusion-based approaches further can provide insights into their potential benefits and applications [20], especially in healthcare [211,212].III.Leveraging new theories for uncertainty quantification.The introduction of new theories can provide valuable frameworks for uncertainty quantification in machine and deep learning.For example, three-way decisions offer a decision-making approach that considers acceptance, rejection, and uncertainty as possible outcomes, making it a useful UQ method for tackling uncertain scenarios [213].Similarly, info-gap decisions provide a theoretical foundation for decision-making in the face of severe uncertainty, where precise knowledge of the model or parameters may be lacking [214,215].IV.Application of transfer learning techniques for uncertainty quantification.When data availability is limited, the application of transfer learning techniques becomes relevant for uncertainty quantification.Transfer learning enables leveraging knowledge and patterns acquired from a source domain with abundant data to enhance learning in a target domain with fewer samples.Investigating the effectiveness of transfer learning in the context of UQ can provide valuable insights and potential benefits.V. Handling uncertainty in Graph Neural Networks (GNNs) and Graph CNNs.The advent of GNNs and Graph CNNs has introduced new challenges and opportunities in uncertainty quantification.These specialized architectures facilitate learning from graph-structured data, but efficient handling of uncertainty in such models requires the proposal and development of innovative methods specifically designed for GNNs [216] and Graph CNNs [217].A review of existing techniques utilized in these domains can offer valuable insights and inform the development of novel approaches.
VI. Enhancing uncertainty calibration approaches in machine learning.Uncertainty calibration approaches play a pivotal role in machine learning by ensuring that predicted uncertainties align with empirical uncertainties, enabling dependable decisionmaking.Proposing novel uncertainty calibration methods can enhance the precision and utility of uncertainty estimates.A review of pertinent literature, including related review papers, can serve as a foundation for identifying and citing established calibration methods.VII.Lastly, the accessibility of public data is essential for advancing machine learning research and promoting collaboration.Access to diverse and well-curated datasets enables researchers to benchmark and compare methods, ensuring reproducibility and fostering further progress in the field.As a result, efforts should be directed toward encouraging the release and sharing of public data, supporting initiatives such as open data platforms or collaborative data-sharing communities.

Conclusion
AI models are increasingly being utilized in healthcare, emphasizing the need to assess the reliability and safety of these systems.A crucial aspect of this assessment involves quantifying the uncertainty in the predictions made by AI models.This study systematically reviewed recent research that employed uncertainty techniques in healthcare applications of machine and deep learning, adhering to PRISMA guidelines.
This review identified Bayesian methods as the primary uncertainty techniques used in healthcare.Moreover, UQ techniques were more prevalent in healthcare applications using deep learning models compared to traditional machine learning models.These findings provide valuable insights for advancing UQ research in healthcare, and improving the reliability and safety of AI systems in this critical field.
Quantifying uncertainty in clinical AI implementation offers several advantages, including improved model accuracy by reducing misclassifications, identification of uncertain cases, enhanced model reliability and safety, and increased confidence among clinical operators, leading to greater acceptance and usage.The results of this study pave the way for future investigations in uncertainty quantification, strengthening the reliability and safety of AI systems in healthcare.Future studies could explore the examined UQ techniques in 1D physiological signals, encompassing multiclass or multimodal data, to further enhance UQ implementation.Additionally, comparing different uncertainty quantification techniques using standardized datasets and consistent metrics would enable a comprehensive analysis of these methods.
It is worth noting that this review focused solely on uncertainty techniques applied to healthcare data and did not include uncertainty techniques in animal or plant data or non-healthcare-specific applications.Future analyses could be conducted to incorporate these aspects and provide a more comprehensive understanding of uncertainty techniques across various domains.• Modified adaptive neuro-fuzzy inference system (M-ANFIS) • Entropy Big Healthcare Data (Patient portals, research studies, electronic health records, wearable devices, etc) The proposed technique performs better than other machine learning techniques Kaur et al. [43] 2020 • Adaptive neuro-fuzzy inference system • Neuro-Fuzzy system Medical data: osteoarthritis (OA) rheumatoid arthritis (RA) and osteonecrosis (ON) diseases.
Proposed system outperforms the fuzzy system in areas such as accuracy, sensitivity, and specificity Sood et al. [219] 2020 • Linear Discriminant Analysis-Adaptive Neuro-Fuzzy Inference System (LDA-ANFIS) The dengue-related data (10 attributes, such as fever, pain behind eyes and so on) and the heart data (6 parameters, such as ECG, HDL, sex) The proposed method demonstrates efficient performance and uses several experimental and statistical methods Study of the impact of a lessening in sugar-sweetened beverage consumption on cardiovascular disease and diabetes Lee et al. [36], 2020 • Monte Carlo Simulation using Geant4 Application Images acquired on two breast phantoms with our Si/ CZT Compton camera imaging system Proposed method confirmed the feasibility of using a Compton camera for the detection of breast cancer Shih et al. [37], 2020 • Monte Carlo simulation Blood irradiator simulated using Monte Carlo simulation and MAGAT gel dosimeter The proposed method can be used to ensure blood products can achieve an accurate delivery dosage.Gasparini et al.
[222], 2020 • Monte Carlo simulation 49 care practices of adults with chronic kidney disease Proposed framework using Monte-Carlo simulation helps in the visitation process with respect to the health care utilization analysis Zouh et al. [28], 2019     [193] 2019 • Test-time augmentation  [98] 2019 • Single deterministic method In-house database: 1600 mammographies Proposed method allows the rejection of the most obvious outliers and improved area under the curve results by up to 10%.Jensen et al., [99] 2019   [92] 2020 • Single deterministic models 4 cardiac magnetic resonance image datasets The recommended method yields the best quantification accuracy and optimization results.Kwon et al., [126] 2020 • Bayesian neural network  [169], 2020 • Bayesian probability approach Cardiac MRI Quantifying atrial anatomy uncertainty from clinical data and its impact on electro-physiology simulation predictions Tanno et al., [138] 2021  The results demonstrate the robustness, transparency, and confidence of the proposed model.
Wang et al. [168], 2022 • Bayesian probability approach 200 CT images A probabilistic generative approach for combining shape and intensity models for cochlear segmentation in CT images Abdar et al. [161] 2023 The proposed model reliably presents diagnostic information, making it suitable for healthcare applications.

Fig. 1 .
Fig. 1.Different sources of uncertainty possibilities in healthcare such as variability in measurements, incomplete or missing data, uncertainty in medical diagnosis, and uncertainty in medical treatment.

Fig. 2 .
Fig. 2. Comparison between our review paper and the existing literature reviews.**ML and DL are machine learning and deep learning.

Fig. 3 .
Fig. 3. Selection of relevant articles based on PRISMA guidelines.

Fig. 7 .
Fig. 7. Pie chart representing the different types of deep learning models used to model uncertainty in deep learning frameworks.

Fig. 8 .
Fig. 8. Bar graph representing the different types of images used with the Bayesian methods.

Fig. 11 Fig. 9 .
Fig.9illustrates the most employed methods in machine learning and deep learning for analyzing different anatomical regions.For models analyzing the thoracic and nervous systems, Bayesian inference emerges as the most utilized technique for uncertainty management.In the case of the digestive system, Dempster-Shafer theory (DST) is the predominant technique, while Monte Carlo simulation is found to be most used for analyzing diverse dataset collections.Regarding deep learning, the Bayesian method remains the primary technique across all analyzed organ systems.Fig.10presents a sunburst diagram highlighting the prevalent deep models for the four most extensively studied image types in Fig.6.Our analysis reveals a notable trend in the utilization of uncertainty techniques, with CNN models being the most employed for studying brain images, followed by Bayesian-based deep models.A similar pattern emerges for eye images, where CNN models are predominantly favored, closely followed by Bayesian-based deep models.In the case of skin images, CNN models hold a widespread preference, while Bayesian-deep models are the preferred choice for analyzing chest images.Integrating uncertainty measurement into deep learning frameworks in healthcare can provide several benefits.It can help improve the reliability and interpretability of the model's predictions, enable better decision-making by clinicians, and enhance patient safety by highlighting areas of uncertainty in the model's output.Fig. 11 illustrates the use of uncertainty techniques in the healthcare domain, incorporating machine learning and deep learning methods from 2013 to 2023.The graph reveals a growing trend in studies examining uncertainty using both machine learning and deep learning approaches throughout the years.Notably, there has been a significant increase in the application of uncertainty techniques in healthcare, particularly in 2019 and 2020, possibly driven by the need to analyze and detect conditions related to COVID-19 complications.However, there has been a decline in the number of studies focusing on uncertainty

Fig. 10 .
Fig. 10.Sunburst diagram detailing deep models most prevalently employed for the four top images studied in Table 3. **The term 'Bay' refers to Bayesian-based deep models.

Fig. 11 .
Fig. 11.Bar graphs of uncertainty techniques involving machines (top graph) and deep learning (bottom graph) from 2013 to 2023.**The term 'DL' refers to deep learning models while 'ML' refers to machine learning models.

Table 2
Summary of the number of papers that employed uncertainty quantification techniques in machine learning frameworks.

Dempster-Shafer theory þ Fuzzy logic
[36]ented a GPU-based microscopic Monte Carlo simulation tool for analyzing the DNA damage induced by ionizing radiations.Their work did not revolve around the development of a new chemical or physical model but rather focused on the implementation of a GPU-based model aimed at improving computational cost.Lee et al.[36]used a Monte **N: Number of articles.S. Seoni et al. they

Table 3
[99]ary of the number of papers that employed uncertainty quantification techniques in deep learning frameworks.confidenceintervals.Tardy et al.[98]utilized a deep neural network classifier for the classification of mammogram images.The authors estimated network uncertainty using two measurements: subjective logic with softmax predictions and Mahalanobis distance between new and training samples in the embedding space, for three different tasks.They reported that the proposed method allows for the rejection of obvious outliers and improves the area under the curve results by up to 10%.Jensen et al.[99]employed a convolutional neural network for skin image classification.They investigated the use of inter-rater variability sampling during training to improve model calibration.
**N: Number of articles.S. Seoni et al.reasonable [140]sed a deep CNN model and explored Bayesian uncertainty estimates and ensemble semi-supervised learning for correcting noisy labels in upper gastrointestinal images.The proposed method effectively improved recognition accuracy for both authentic and noisy clinical data.Bayesian teacher-student deep model with Monte Carlo dropout to estimate segmentation and feature uncertainty in atrial MRI and kidney CT scan images.The authors found that their proposed method outperformed existing semi-supervised uncertainty estimates on both datasets, demonstrating its effectiveness in uncertainty estimation.Bian et al.[137]combined a segmentation network with a Conditional Variational Autoencoder (CVAE) for uncertainty estimation, using the variance of the network's output as a measure of uncertainty.They proposed an Uncertainty-aware Cross Entropy (UCE) loss to leverage uncertainty information and improve segmentation performance in highly uncertain regions.The findings demonstrated that the proposed method outperformed existing methods for unsupervised domain adaptation tasks.Tanno et al.[138]combined a noise model with Bayesian inference for uncertainty estimation in brain tumor image datasets.Their results demonstrated that measuring uncertainty improved prediction performance and enabled the detection of predictive failures.Additionally, the decomposition of predictive uncertainty provided high-quality explanations for model performance.Thiagarajan et al.[139]employed and compared Bayesian-based and transfer learning CNN models for uncertainty estimation in breast histopathology images.The findings showed that the Bayesian CNN model outperformed existing models and was useful in explaining uncertainties in histological images.Ghosal et al.[140]introduced two innovative techniques, Monte Carlo DropWeight and Bayesian Residual UNet, specifically designed for estimating aleatory and epistemic uncertainty.
[141]et al. [136]employed a unique approach by implementing a By employing these methods, the authors were able to accurately estimate uncertainty, significantly boosting the confidence of clinicians in the field of semantic segmentation.Edupuganti et al.[141]employed variational autoencoders and convolutional neural network models to quantify uncertainty in MRI segmentation of knee images.They utilized Monte Carlo sampling to create a posterior of image pixel variance maps and achieved a SURE-MSE (Stein's Unbiased Risk Estimator) value of 0.97 for 2-fold under-sampling.Valliuddin et al.
[159]zed deep Residual Inception Networks to investigate aleatoric and epistemic uncertainties in 12-lead electrocardiogram signals.The authors concluded that variational inference with Bayesian decomposition and ensemble with auxiliary output performed the best, but high uncertainty in deep neural network-based ECG signal classification correlated with lower diagnostic agreement compared to the interpretation of cardiologists.Sieradzki et al.[147]employed a deep generative model with recurrent neural networks for compound bioactivity prediction.They utilized dropout-based uncertainty estimation by passing test samples through the network with weight dropout, measuring uncertainty from variance.The proposed method achieved precision values between 0.0004 and 0.0007.brainMRIimageclassification to detect brain tumors.They computed the mean of the variance in a predicted posterior distribution obtained by running.Sedghi et al.,[149]employed a CNN to assess model agreement in brain image registration.The authors computed the variance in displacements for various brain MRI images.This approach facilitated the estimation of local registration uncertainty, which helps identify areas where the two images may not align well and provide information to end-users about the registration quality.Norouzi et al.[150]employed fully convolutional neural networks for cardiac image segmentation and computed model uncertainty by estimating the variance of the model's output.They further enhanced segmentation accuracy using conditional random fields and assessed the proposed approach with three different metrics.The authors emphasized the incorporation of new techniques and the successful integration of simple ideas with deep neural networks.Filos et al.[151]conducted a systematic study comparing various uncertainty estimation methods using Bayesian deep learning techniques for diabetic retinopathy classification.Their research emphasized the importance of systematic comparisons to demonstrate the efficacy of Bayesian deep learning techniques on large-scale problems.Ghoshal et al.[152]utilized a Monte-Carlo Dropweights Bayesian Convolutional Neural Networks (BCNN) model to estimate uncertainty in predictions of deep learning models applied to chest X-ray images of patients with COVID-19.Their results revealed a correlation between uncertainty and prediction accuracy.Dolezal et al.[153]developed deep convolutional neural network models for the classification of lung adenocarcinoma and squamous cell carcinoma in out-of-distribution digital histopathological data.They estimated slide-level uncertainty for whole slide images by applying uncertainty thresholding to generalize the handling of out-of-distribution data.The findings of the study corroborated that high-confidence predictions outperform those without uncertainty, and uncertainty thresholding is a reliable approach for making high-confidence predictions in lung adenocarcinoma and squamous cell carcinoma out-of-distribution data.Mensah et al.[154]uniquely employed Bayesian capsule networks for uncertainty estimation on computer vision and chest X-ray image datasets using mean-field variational inference.They highlighted the transparency, credibility, reliability, and interpretability of Bayesian capsule networks in gaining the confidence of industry partners.Mazoure et al.[155]developed a distinctive web server for deep uncertainty estimation of skin lesion images, specifically for skin cancer detection.They compared the means and variances from new and traditional convolutional neural network models.The findings established that the proposed method outperforms other supervised, self-supervised, and uncertainty estimation techniques, making it the best-performing approach in skin cancer detection.Jahmunah et al.[156]employed the deep DenseNet model to estimate predictive entropy for the misclassification of normal and myocardial infarction ECG signals.Based on the obtained results, the authors asserted that the proposed model is reliable, trustworthy, and confident in the diagnostic information it provides.Therefore, it holds great potential for utilization in healthcare applications.Stoean et al.[157]investigated the use of Monte Carlo dropout within the DL structure to automatically identify indicators of spinocerebellar ataxia type 2 from saccadic samples obtained from electrooculograms.Unlike the typical integration of this specific dropout method in deep neural networks, the researchers used the uncertainty derived from validation samples to construct a decision tree at the patient register level.This decision tree, constructed from uncertainty estimates, achieved a classification accuracy of 81.18% in Da Silvia et al.[159]presented a Monte Carlo method-based approach to analyze the performance of measurement systems during design phases to improve their quality.They focused on a simulated electrocardiogram system, using measurement uncertainty as a performance parameter during the design process.The Monte Carlo method enabled the identification of the primary source of ECG measurement uncertainty, aiming for better characterization of the metrological behavior of ECG measurements.Nasir et al. [160] introduced a model for the early prediction of type 2 diabetes mellitus (T2DM) using real-world electronic health record (EHR) data, which included historical diagnoses, patient vitals, and demographic information.By employing Monte Carlo dropout for uncertainty estimation, the proposed model demonstrated a 1.6% accuracy improvement compared to baseline techniques.Abdar et al. [161] developed a novel deep learning model called UncertaintyFuseNet, specifically designed for the accurate classification of large CT scan and X-ray image datasets in COVID-19 cases.The model integrated the Ensemble Monte Carlo Dropout (EMCD) technique, which effectively estimated uncertainty during the learning process.The experimental results showcased the model's efficacy, with impressive prediction accuracies of 99.08% for CT scan datasets and 96.35% for X-ray datasets.Additionally, UncertaintyFuse-Net displayed robustness to noise and reliable performance when applied to unseen data.MacDonald et al. [162] conducted a comparative analysis of three approximate Bayesian deep learning models for predicting cancer of unknown primary origin, using three RNA-seq datasets consisting of 10,968 samples across 57 cancer types.The study demonstrated that Bayesian deep learning is a promising approach for generalizing uncertainty, thereby improving the performance, transparency, and safety of deep learning models in real-world applications.The authors demonstrated that quantifying the uncertainty of the shape impacts the simulation of cardiac activation in the left atrium.Dhamala et al. [170] used Direct Markov Chain Monte Carlo for uncertainty estimation in personalized modeling with small-sized datasets, enhancing clinical decision-making reliability.The framework was evaluated in cardiac electrophysiological modeling using synthetic and real data experiments, revealing valuable parameter uncertainty insights through efficient surrogate modeling integration.Chen et al. [171] introduced TransMorph, a cutting-edge model for unsupervised deformable image registration.Distinguished from traditional approaches, TransMorph leveraged the Transformer architecture and incorporates Bayesian deep learning to estimate deformation uncertainty without compromising registration performance.The model's validation on brain MRI images and phantom-to-CT images showcased superior accuracy compared to conventional methods.Abdullah et al. [172] proposed a study to assess uncertainty in multi-layer perceptron (MLP) Mixer models and CNN models for small datasets using Bayesian Deep Learning (BDL) techniques.Their results showed that BDL significantly improved MLP-Mixer performance by 9.2%-17.4% across various models.Dolezal et al. [173] introduced Slideflow, a versatile deep learning library for histopathologic image processing and visualization.This library integrates uncertainty estimation using Monte Carlo Dropout into a variety of deep learning models for stain normalization, augmentation, and classification.
[169]ar et al.[148]developed convolutional neural network models S.Seoni et al.for distinguishing between control, presymptomatic, and symptomatic classes.Guo et al.[158]proposed a technique to enhance multi-class segmentation of cardiac MRI by combining CNNs with interpretable machine learning algorithms.This approach demonstrated significant improvement over traditional CNN segmentation.Evaluations were performed on two distinct cardiac MRI datasets representing various cardiovascular pathologies, with the proposed model exhibiting increased segmentation accuracy and reduced variability.In a separate study, Farooq et al.[163]proposed a residual-attention-based, uncertainty-guided mean teacher framework that incorporated residual and attention blocks for breast cancer detection.The quantitative and qualitative findings showed that the proposed framework outperformed state-of-the-art techniques and surpassed existing methods for breast ultrasound mass segmentation.The study also highlighted the potential of including additional unlabeled data to enhance breast tumor segmentation performance.Abdar et al.[164]proposed a simple, yet novel, hierarchical attentive multilevel feature fusion model that leveraged uncertainty quantification during predictions in the classification task.By integrating dropout and Bayesian inference techniques, they effectively enhanced the performance in terms of accuracy, recall, and precision for classification in OCT, lung CT, and chest X-ray.Zakeri et al. approach for CT image segmentation of cochlear structures.The framework balanced shape and appearance information using likelihood appearance and prior label probabilities based on a generic shape function, showing promising results on multiple datasets.Corrado et al.[169]utilized Bayesian probabilistic methods to estimate left atrium anatomy from Cardiac Magnetic Resonance images.The proposed model quantified uncertain left atrial shape, accounting for imaging artifacts, and assessed its impact on left atrial activation time simulations.

Table A2
Summary of studies on uncertainty estimation techniques in healthcare applications using deep learning approaches.
(continued on next page)

Table A2
Single deterministic modelsAutism brain images from 4 datasets; 106, 175, 72, and 71 images respectively Proposed method has the potential to determine biomarkers related to disease in neuroimaging data.
(continued on next page)

Table A2 (
continued ) Ensemble models MRI prostate lesion images from 60 patients suspected of having cancer, diffusion-weighted MRI images from 80 patients Better image representations are obtained in segmentation for cancer characterization.Wang et al., [136] 2020 • Teacher-student Bayesian deep model • Monte-Carlo dropout 100 MRI images of the left atrium, 210 CT scan images of the kidney Proposed method outperforms existing semi-supervised uncertainty estimates on both datasets.Ye et al., [97] 2020 Monte Carlo dropout Eighty-five EOG tests The novel method integrates uncertainty quantification into decision trees using MC dropout, achieving 81.18% accuracy in classifying control, presymptomatic, and sick classes Guo et al. [158], 2020 • Monte Carlo dropout The UKBB dataset: 3D cardiac images Incorporating MCD uncertainty enhanced the segmentation performance of the model when applied to cardiovascular disease image data.Guo et al. [188], 2020 • Ensemble models Public CXR dataset (15134 images) The method combines label fusion, uncertainty-guided continuous kernel cut, and deep learning for accurate ventricle segmentation and function measurements in cardiac cine MRI.Corrado et al.
Bayesian inference for uncertainty 288 diffusion-weighted MRI images of the brain per subject, MRI images of 26 subjects, 2 healthy male MRI images, brain tumor + multiple sclerosis images Dong et al. [200], 2021 • Test-time augmentation Public CXR dataset (15134) The MUL method combines parallel dropout networks for accurate diagnoses and uncertainty estimations.RCoNet model outperforms existing methods in all metrics (continued on next page)

Table A2 (
continued ) Monte-Carlo dropout CT scan and X-ray images UncertaintyFuseNet integrates EMCD for accurate classification of COVID-19 CT scan and X-ray images, achieving high accuracies of 99.08% and 96.35%.Da Silvia et al. [159], 2023 • Monte-Carlo dropout ECG data The Monte Carlo method was used to identify the primary source of ECG measurement uncertainty, improving the understanding of the metrological behavior of ECG measurements.Nasir et al. [160] 2023 • Monte-Carlo dropout Real-world electronic health record (EHR) data A model for early prediction of type 2 diabetes mellitus utilized real-world EHR data and incorporated Monte Carlo dropout for uncertainty estimation.The model showed a 1.6% accuracy improvement.MacDonald et al. [162], 2023 • Monte-Carlo dropout Transcriptomic data: three RNA-seq datasets Three Bayesian DL models were compared for cancer prediction.Bayesian DL has the potential to improve performance, transparency, and safety by effectively handling uncertainty in real-world applications.